T cell receptor and b cell receptor repertoire analysis system, and use of same in treatment and diagnosis

ABSTRACT

The repertoire of the variable region of T cell receptors (TCR) or B cell receptors (BCR) is quantitatively analyzed using non-biased gene sequence analysis. The present invention provides the following: a method for quantitatively analyzing the repertoire of the variable region of the T cell receptors (TCR) or B cell receptors (BCR) of a subject by using a database, wherein the method includes (1) a step for providing a nucleic acid sample containing the nucleic acid sequence of T cell receptors (TCR) or B cell receptors (BCR) amplified in a non-biased manner from the subject; (2) a step for determining the nucleic acid sequence contained in the nucleic acid sample; and (3) a step for calculating the frequency of appearance of each gene or combination thereof on the basis of the determined nucleic acid sequence and deriving the TCR or BCR repertoire of the subject.

STATEMENT REGARDING SEQUENCE LISTING

The Sequence Listing associated with this application is provided intext format in lieu of a paper copy, and is hereby incorporated byreference into the specification. The name of the text file containingthe Sequence Listing is 790132 401D1 SEQUENCE LISTING.txt. The text fileis 598 KB, was created on Aug. 19, 2021, and is being submittedelectronically via EFS-Web.

TECHNICAL FIELD

The present invention relates to a technology of amplifying a genecreated by gene rearrangement from a biological sample without applyinga bias, a system for analyzing the resulting genetic information, andtherapy and diagnosis thereof.

BACKGROUND ART

A biological defense mechanism utilizing an immune system is heavilydependent on specific immunity provided mainly by T cells and B cells. Tcells and B cells do not react to their own cells or molecules and arecapable of specifically recognizing and attacking exogenous pathogenssuch as viruses or bacteria. For this reason, T cells and B cells have amechanism capable of recognizing and distinguishing autoantigens as wellas various antigens derived from other organisms by a receptor moleculeexpressed on a cell surface. T cell receptor (TCR) and B cell receptor(BCR) act as an antigen receptor in T cells and B cells, respectively.An intracellular signal is transmitted by a stimulation from suchantigen receptors. Production of inflammatory cytokines, chemokines orthe like are enhanced, cell proliferation increased, and various immuneresponses are initiated. TCRs recognize a peptide bound to a peptidebinding groove of a major histocompatibility complex (MHC) expressed onan antigen-presenting cell (peptide-MHC complex, pMHC) to distinguishself from non-self and recognize an antigen peptide (Non PatentLiterature 1). TCRs are heterodimer receptor molecules consisting of twoTCR polypeptide chains. There are αβ TCRs expressed by normal T cellsand γδ TCRs with a special function. α and β chain TCR molecules form acomplex with a plurality of CD3 molecules (CD3ζ chain, CD3ε chain, CD3γchain, and CD3δ chain), transmit an intracellular signal after antigenrecognition, and initiate various immune responses. With a viralinfection, an endogenous antigen such as a cancer antigen derived from acancer cell or a viral antigen proliferated in a cell is presented as anantigen peptide on an MHC class I molecule. Further, an antigen derivedfrom an exogenous microorganism is taken up and processed by anantigen-presenting cell by endocytosis, and then presented on an MHCclass II molecule. Such antigens are recognized by TCRs expressed byeach of CD8⁺ T cell and CD4⁺ T cell. It is also known that acostimulatory molecule such as a CD28, ICOS, or OX40 molecule isimportant for stimulation via a TCR molecule.

A TCR gene consists of numerous V regions (variable region, V), Jregions (joining region, J), D regions (diversity region, D), andconstant regions, C regions (C) encoded by different regions in thegenome. In T cell differentiation process, such gene fragments aregenetically rearranged in various combinations. α chain and γ chain TCRsexpress genes consisting of V-J-C and β chain and δ chain TCRs expressgenes consisting of V-D-J-C. Currently, database of the IMGT (theInternational ImMuno GeneTics project) has 43 types of functional αchain TCR V gene fragments (TRAV), 50 types of TCR J gene fragments(TRAJ), 40-42 types of functional β chain TCR V gene fragments (TRBV), 2types of TCR D gene fragments (TRBD), types of TCR J gene fragments(TRBJ), 4-6 types of functional γ chain V gene fragments (TRGV), 5 typesof TCR J gene fragments (TRGJ), 3 types of functional 5 chain V genefragments (TRDV), 3 types of TCR D gene fragments (TRDD), and 4 types ofTCR J gene fragments (TRDJ) (Non Patent Literature 2). Diversity iscreated by rearrangement of such gene fragments. In addition, insertionor deletion of one or more bases between V and D or D and J genefragments leads to the formation of a random amino acid sequence tocreate a more diverse TCR gene sequence.

A region where a TCR molecule directly binds to a pMHC complex surface(TCR footprint) is composed of three diverse complementarity determiningregions (CDR) within the V region, CDR1, CDR2, and CDR3 regions. TheCDR3 region in particular comprises a part of a V region, a part of Jregion and a V-D-J region formed by a random sequence, forming the mostdiverse antigen recognition site. Meanwhile, the other regions arecalled FRs (framework region) serving the role of forming a backbonestructure of a TCR molecule. In a differentiation and maturation processof a T cell in the thymus gland, a β chain TCR is genetically rearrangedinitially, and conjugates with a pTα molecule to form a pre-TCR complexmolecule. An α chain TCR is then rearranged to form an αβ, TCR molecule,and when a functional αβ, TCR is not formed, rearrangement occurs in theother α chain TCR gene allele. It is known that after undergoingpositive/negative selection in the thymus gland, a TCR with a suitableaffinity is selected to acquire antigen specificity (Non PatentLiterature 3).

A BCR is known as an immunoglobulin (Ig). A membrane-bound Ig acts as anantigen receptor molecule as a BCR. A secretory protein thereof issecreted to the outside of a cell as an antibody. A large amount ofantibodies is secreted from a terminally differentiated plasma cell andhas functions to eliminate pathogens by binding to a pathogenic moleculesuch as a virus or bacteria or by a subsequent immune reaction such as acomplement binding reaction. A BCR is expressed on a B cell surface.After binding to an antigen, the BCR transmits an intracellular signalto initiate various immune responses or cell proliferation. Diversity ofamino acid sequences at an antigen-binding site is responsible for thespecificity of a BCR. Sequences at an antigen-binding site greatly varyamong BCR molecules and are called variable sections (V regions).Meanwhile, a sequence of a constant region (C region) is highlyconserved among BCR molecules or antibody molecules. Such a region hasan effector function of an antibody or a signaling function of areceptor.

A BCR and an antibody are the same except for the presence or absence ofa membrane-binding domain. An Ig molecule consists of polypeptidechains, two heavy chains (H chains) and two light chains (L chains). Inone Ig molecule, two H chains, or one H chain and one L chain, are boundby a disulfide bond. There are 5 different H chain classes (isotypes)called μ chain, α chain, γ chain, δ chain, and ε chain in Ig, which arecalled IgM, IgA, IgG, IgD, and IgE, respectively. It is known thatfunctions and roles generally vary depending on the isotype, e.g., anantibody with a high level of specificity which is functional inbiological defense is an IgG antibody, an IgA antibody is involved inmucosal immunity, and an IgE antibody is important in allergy, asthma,and atopic dermatitis. Furthermore, it is known that there are severaltypes of subclasses in isotypes, such as IgG1, IgG2, IgG3, and IgG4. Itis understood that there are two types of L chains, A chain (IgL) and Kchain (IgK), which can bind to an H chain of any class, and there is nofunctional difference therebetween (Non Patent Literature 4).

As in TCR genes, BCR genes are formed by gene rearrangement that occursin a somatic cell. A variable section is encoded in a few separate genefragments in the genome, which induce somatic cell genetic recombinationin the differentiation process of a cell. A genetic sequence of avariable section of an H chain consists of a C region (constant region,C) defining an isotype that is different from a D region, a J region,and a V region. Each gene fragment is separated in the genome, but isexpressed as a series of V-D-J-C genes by gene rearrangement. Thedatabase of the IMGT has 38-44 types of functional IgH chain V genefragments (IGHV), 23 types of D gene fragments (IGHD), 6 types of J genefragments (IGHJ), 34 types of functional IgK chain V gene fragments(IGKV), 5 types of J gene fragments (IGKJ), 29-30 types of functionalIgL chain V gene fragments (IGLV), and 5 types of J gene fragments(IGLJ). These gene fragments undergo gene rearrangement to ensurediversity of BCRs. Furthermore, highly diverse CDR3 regions are formedby a random insertion or deletion in an amino acid sequence as in TCRs(Non Patent Literature 2).

In a differentiation and maturation process of a B cell, IgM isinitially produced by an immature B cell. A naive B cell that has notbeen exposed to an antigen co-expresses IgM and IgD. After beingstimulated and activated by a stimulation of an antigen, a class switch(isotype switch) that converts a C region of IgM, Cμ, with a C regionsequence of IgG, Cγ, occurs while the sequence of a variable sectionremains the same. Similarly, Cμ is converted to C region of IgA (Ca) orC region of IgE (Cε) to produce IgA or IgG. With such a class switchrecombination, the type of antibody required for eliminating a pathogenis produced where it is required. Furthermore, in the proliferationprocess of a B cell that has undergone a class switch, a mutation occursat a high frequency in the variable section of an IgG, IgA or IgE region(somatic hypermutation). As a result, a B cell that has acquired ahigher level of specificity to an antigen is further stimulated andproliferated, such that an antibody producing B cell with a higher levelof specificity is selected through this process (affinity maturation)(Non Patent Literature 5).

A T cell or a B cell produces one type of TCR or BCR with a high levelof specificity to a specific antigen. With numerous antigen specific Tcells and B cells in a living organism, a diverse TCR repertoire or BCRrepertoire can be formed to effectively function as a defense mechanismagainst various pathogens. Thus, analysis of a TCR or BCR repertoire,which is an important indicator of specificity or diversity of immunecells, is a useful analytical tool for analyzing monoclonality or immunedisorder. If T cells or B cells proliferate in response to an antigen,the ratio of a specific TCR or BCR gene is observed to increase in adiverse repertoire (increased clonality). An attempt has been made todetect development of tumor in lymphoid cells expressing TCRs or BCRs interms of increase in clonality by TCR or BCR repertoire analysis (NonPatent Literature 6). Further, it is reported that the usage frequencyof a specific Vβ, chain increases when exposed to a molecule thatselectively stimulates a TCR having a specific Vβ, chain such as superantigens (Non Patent Literature 7). In order to investigate antigenspecific immune responses, it is frequently used in analysis ofrefractory autoimmune diseases that are induced by immune disorders suchas rheumatoid arthritis, systemic lupus erythematosus, Sjögren'ssyndrome, and idiopathic thrombocytopenic purpura, and the usefulnessthereof has been demonstrated.

Conventional TCR repertoire analysis is an analytic method for examininghow many individual V chains are used by T cells in a sample. One of themethods is a method of analyzing the ratio of T cells expressingindividual Vβ chains by using a specific Vβ, chain specific antibodywith flow cytometry (FACS analysis). Since a relatively large number ofcells are required, this technology is useful for analyzing peripheralblood comprising many lymphocytes, but cannot be adapted to a tissuematerial sample. Further, since an antibody that is compatible with allof the V chains is still not available today, comprehensive analysis isnot possible.

In addition thereto, TCR repertoire analysis using molecular biologicaltechnology has been designed based on information on TCR genes obtainedfrom the human genomic sequence. This is a method of extracting an RNAfrom a cell sample to synthesize a complementary DNA and then amplifyingand quantifying a TCR gene by PCR. It has been conventional to use amethod of designing numerous individual TCR V chain specific primers toseparately quantify by real-time PCR or the like, or a method ofsimultaneously amplifying such specific primers (Multiple PCR). However,even in quantification using an endogenous control for each V chain,accurate analysis is not possible when a large number of primers areused. Furthermore, multiple PCR has a disadvantage in that a differencein efficiency of amplification among primers results in a bias in PCRamplification. In order to overcome such a disadvantage of multiple PCR,Tsuruta et al reported Adaptor-ligation PCR, which adds an adaptor tothe 5′ terminal of a double stranded complementary DNA of a TCR gene andthen amplifies all γδ TCR genes with a common adaptor primer and a Cregion specific primer (Non Patent Literature 8). Furthermore, methodsapplied to amplification of αβ TCR genes for quantification witholigoprobes specific to individual V chains were developed, i.e.,Reverse dot blot (Non Patent Literature 9) and Microplate hybridizationassay (Non Patent Literature 10). They are excellent methods foramplifying a TCR gene without introducing a bias. However, hardly anyinformation other than the usage frequency of V chains can be obtained.Base sequence information or the like of a CDR3 region, the J chain, Dchain or antigen recognition site, required subsequent cloning of acomplement chain DNA of a TCR gene and determination of the basesequence.

In recent years, rapidly advancing next generation sequence analysistechniques have enabled large scale base sequence determination ofgenes. By amplifying a TCR gene from a human sample by PCR and using anext generation sequence analysis technique, it is possible tomaterialize a next generation TCR repertoire analysis method forobtaining and analyzing more detailed clone level genetic informationfrom TCR repertoire analysis for obtaining information that has beensmall scale and limited to V chain usage frequency or the like. In thiscontext, few next generation TCR repertoire analysis methods have beendeveloped (Patent Literatures 1 and 2) while other attempts have alsobeen made (Patent Literatures 3-11).

CITATION LIST Patent Literature

-   [PTL 1] International Publication No. WO 2009/137255-   [PTL 2] International Publication No. WO 2013/059725-   [PTL 3] Japanese Laid-Open Publication No. 10-229897-   [PTL 4] Japanese National Phase PCT Laid-open Publication No.    2007-515154-   [PTL 5] Japanese National Phase PCT Laid-open Publication No.    2012-508011-   [PTL 6] Japanese Laid-Open Publication No. 2013-116116-   [PTL 7] Japanese National Phase PCT Laid-open Publication No.    2013-524848-   [PTL 8] Japanese National Phase PCT Laid-open Publication No.    2013-524849-   [PTL 9] International Publication No. WO 2013/033721 A1-   [PTL 10] International Publication No. WO 2013/043922 A1-   [PTL 11] International Publication No. WO 2013/044234 A1

Non Patent Literature

-   [NPL 1] Cell 1994, 76, 287-299-   [NPL 2] Nucleic Acid Research, 2009, 37 (suppl1), D1006-D1012.-   [NPL 3] Annual Review Immunology, 1993, 6, 309-326-   [NPL 4] Annual Review Immunology, 2000, 18, 495-527-   [NPL 5] Proc Natl Acad Sci, 1993, 90, 2385-2388-   [NPL 6] Leukemia Research, 2003, 27, 305-312-   [NPL 7] Immunology 1999, 96, 465-72.-   [NPL 8] Journal of Immunological Methods, 1994, 169, 17-23-   [NPL 9] Journal of Immunological Methods, 1997, 201, 145-15.-   [NPL 10] Human Immunology, 1997, 56, 57-69

SUMMARY OF INVENTION Solution to Problem

The present invention is an invention related to an analysis method andanalysis system applied to (1) a technique for amplifying a TCR or BCRgenetic sequence produced by gene rearrangement from multiple genefragments in a genome without applying a bias (unbiased geneamplification technique), and (2) a technology for determining the basesequence of a TCR or BCR gene amplified by the unbiased geneamplification technique in a large scale by a next generation sequencingmethod, assigning V, D, J, and C regions, and analyzing a TCR repertoireor BCR repertoire.

Diverse genetic sequences are created by gene rearrangement of multiplegene fragments of V, D, J, and C regions on a genome for TCRs or BCRs. Atechnology of producing numerous primers specific to numerous V or Jregions that are present and amplifying in the same or separate reactionsolution is widely utilized to determine a base sequence of a TCR or BCRgene by a next generation sequencing technique. However, the differencein amplification efficiency among primers would be a critical issue inPCR amplification which exponentially amplifies a small amount of gene.Further, it is necessary that primers set to V and J regions arecompatible with all known allelic sequences. A point mutation isintroduced at a high frequency (up to about 20%) in a variable sectionregion of IgG, IgA or IgE by a somatic hypermutation mechanism for a BCRgene. Thus, if a 20 base primer is set, about 4 bases would have amismatch. Hence, it is difficult to materialize uniform geneamplification with a conventional method. That is, known methods ofdesigning a V chain specific primer based on a genomic sequence cannotavoid a mismatch with the actual BCR genetic sequence such thatquantitative gene amplification is not guaranteed. Furthermore, a BCRhas an isotype and a subclass defined by a C region sequence. It isnecessary to develop a quantification method for each isotype orsubclass utilizing a difference in the base sequence among isotypes orsubclasses. In order to overcome the disadvantage of a technology usinga V chain specific primer currently in use, the inventors have completeda method of amplifying a TCR or BCR gene including all isotype andsubtype genes with a set of primers consisting of one type of forwardprimer and one type of reverse primer without changing the frequency ofpresence and determining a base sequence in a large scale by using anext generation sequencing.

Focus was placed on the genetic structure of a TCR or BCR gene. Anadaptor sequence is added, without setting a primer to highly diverse Vregions, to a 5′ terminal thereof to amplify a gene comprising all Vregions.

Such an adaptor can have any length or sequence in a base sequence.About 20 base pairs are optimal, but a sequence from 10 bases to 100bases can be used.

An adaptor added to the 3′ terminal is removed with a restrictionenzyme. In addition, all TCR or BCR genes are amplified by amplifyingwith a reverse primer specific to a C region which has a common sequencewith an adaptor primer with the same sequence as a 20 base pair adaptor.

A complementary strand DNA is synthesized with a reverse transcriptasefrom a TCR or BCR gene messenger RNA and then a double strandedcomplementary DNA is synthesized. A double stranded complementary DNAcomprising V regions with different lengths is synthesized by a reversetranscription reaction or a double strand synthesizing reaction.Adaptors consisting of 20 base pairs and 10 base pairs are added to the5′ terminal section of such genes by a DNA ligase reaction.

The genes can be amplified by setting a reverse primer in a C region ofa heavy chain of μ chain, α chain, δ chain, γ chain or ε chain or alight chain of κ chain or λ chain for BCRs and α chain, β chain, γ chainor δ chain for TCRs.

As a reverse primer set in a C region, a primer is set which matches thesequence of each of Cβ, Cα, Cγ and Cδ for TCRs and the sequence of eachof Cμ, Cα, Cδ, Cγ, Cε, Cκ, and Cλ for BCRs and has a mismatch to anextent where other C region sequences are not primed.

A reverse primer of a C region is optimally made while considering thebase sequence, base composition, DNA melting temperature (Tm), orpresence of a self-complementary sequence, such that amplification withan adaptor primer is possible.

Each BCR gene IgG subtype (γ1, γ2, γ3, and γ4) and IgA subtype (α1 andα2) can be amplified with the same primer to determine the subtype bydetermining the base sequence.

A primer can be set in a region other than the base sequence that isdifferent among allelic sequences in a C region sequence to uniformlyamplify all alleles.

A plurality of stages of nested PCR are performed in order to enhancethe specificity of an amplification reaction.

The length (number of bases) of a primer candidate sequence is notparticularly limited for a sequence not comprising a sequence that isdifferent among allelic sequences for each primer. However, the numberof bases is 10-100, preferably 15-50, and more preferably 20-30. Thus,the present invention also provides the following.

<In Silico>

In one aspect, the present invention relates to a technology foranalyzing a TCR or BCR repertoire based on a group of expressed TCRs orBCR genetic sequences derived from a biological sample.

The present invention is not dependent on the model of sequencer for anyV(-D)-J-C series nucleic acid sequence. Classification itself ispossible even without unbiasness.

The input can be either a plus strand or complementary strand.

For classification of nucleic acid sequences, it is common to set areference database that has accumulated standard sequences serving asthe baseline of classification (hereinafter, referred to as referencesequence) and assign each nucleic acid sequence to one of the referencesequences by a technology for homology search. However, it is necessaryin this case to prepare an enormous number of reference sequences fromcombining each region of V, D, and J, which is not practical. Atechnology of setting a reference database for each of V, D, and J isconceivable. However, the difference from a reference sequence would belarge due to a random mutation in V. Further, D and J have a shortregion. Thus, the possibility of oversight cannot be ignored for acommon homology search technology. A technology of translating theentire nucleic acid sequence of a subject of analysis into an amino acidsequence and classifying the sequence by materials is conceivable.However, such a technology would be vulnerable especially to sequencingerror from insertion/deletion and the relationship withpreviously-reported gene names and alleles would be unknown, such thatit would be difficult to use known information.

The reference database used in the present invention is prepared foreach of V, D, and J (and C for BCR) gene regions. Typically, a nucleicacid sequence data set is used for each allele or each region publishedby the IMGT, but is not limited thereto. Any data set with a unique IDassigned to each sequence can be used.

For the input sequence set used in the present invention, an adaptorsequence or low quality region is generally trimmed in advance and onlya sequence with an sufficient length for analysis is extracted toconstruct a high quality set. This step is not necessarily required, butis used in a preferred embodiment. This is because, even without suchprocessing, an LQ sequence would simply be “unclassifiable”.

The input sequence set used in the present invention searches forhomology with a reference database for each gene region and records analignment with the closest reference allele and the sequence thereof. Inthis regard, an algorithm with high tolerance for a mismatch except forC is used for homology search. For instance, when a common homologysearch program such as BLAST is used, setting such as shortening of thewindow size, reduction in mismatch penalty, or reduction in gap penaltyis set for each region.

The closest reference allele is selected by using a homology score,alignment length, kernel length (length of consecutively matching basesequence) and number of matching bases as indicators applied inaccordance with a defined order or priority.

For the input sequence with determined V and J used in the presentinvention, a CDR3 sequence is extracted with the front of CDR3 onreference V and end of CDR3 on reference J as guides. This is translatedinto an amino acid sequence for use in classification of a D region.When a reference database of a D region is prepared, a combination ofresults of homology search and results of amino acid sequencetranslation is used as a classification result.

In view of the above, each allele of V, D and J (and C for BCR) isassigned for each sequence in an input set. The frequency of appearanceby each of V, D and J (and C for BCR) or frequency of appearance of acombination thereof in the entire input set is subsequently calculatedto derive a TCR or BCR repertoire. The frequency of appearance iscalculated in a unit of allele or unit of gene name depending on theprecision required in classification. The latter is possible bytranslating each allele to a gene name. Thus, the present invention alsoprovides the following.

<1> A method of analyzing a TCR or BCR repertoire, comprising thefollowing steps:(1) providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion;(2) providing an input sequence set which is optionally trimmed andoptionally extracted to have a suitable length;(3) searching for homology of the input sequence set with the referencedatabase for the each gene region and recording an alignment with anapproximate reference allele and/or a sequence of the reference allele;(4) assigning the V region and the J region for the input sequence setand extracting a nucleic acid sequence of the D region based on a resultof assigning;(5) translating the nucleic acid sequence of the D region into an aminoacid sequence and classifying the D region by utilizing the amino acidsequence; and(6) calculating a frequency of appearance for each of the V region, theD region, and the J region and optionally the C region or a frequency ofappearance of a combination thereof based on the classifying in (5) toderive the TCR or BCR repertoire.<2> The method of item <1>, wherein the gene region comprises all of theV region, the D region, the J region and optionally the C region.<3> The method of any one of items <1>-<2>, wherein the referencedatabase is a database with a unique ID assigned to each sequence.<4> The method of any one of items <1>-<3>, wherein the input sequenceset is an unbiased sequence set.<5> The method of any one of items <1>-<4>, wherein the sequence set istrimmed.<6> The method of any one of items <1>-<5>, wherein the trimming isaccomplished by the steps of: deleting low quality regions from bothends of a read; deleting a region matching 10 bp or more with an adaptorsequence from the both ends of the read; and using the read as a highquality read in analysis when a remaining length is 200 bp or more (TCR)or 300 bp or more (BCR).<7> The method of item <6>, wherein the low quality refers to a 7 bpmoving average of QV value less than 30.<8> The method of any one of items <1>-<7>, wherein the approximatesequence is the closest sequence.<9> The method of any one of items <1>-<8>, wherein the approximatesequence is determined by a ranking of 1. number of matching bases, 2.kernel length, 3. score, and 4. alignment length.<10> The method of any one of items <1>-<9>, wherein the homology searchis conducted under a condition tolerating random mutations to bescattered throughout.<11> The method of any one of items <1>-<10>, wherein the homologysearch comprises at least one condition from (1) shortening of a windowsize, (2) reduction in a mismatch penalty, (3) reduction in a gappenalty, and (4) a top priority ranking of an indicator is a number ofmatching bases, compared to a default condition.<12> The method of any one of items <1>-<11>, wherein the homologysearch is carried out under the following conditions in BLAST or FASTA:

V mismatch penalty=−1, shortest alignment length=30, and shortest kernellength=15;

D word length=7 (for BLAST) or K-tup=3 (for FASTA), mismatch penalty=−1,gap penalty=0, shortest alignment length=11, and shortest kernellength=8;

J mismatch penalty=−1, shortest hit length=18, and shortest kernellength=10; and

C shortest hit length=30 and shortest kernel length=15.

<13> The method of any one of items <1>-<12>, wherein the D region isclassified by a frequency of appearance of the amino acid sequence.<14> The method of any one of items <1>-<13>, wherein a combination of aresult of search for homology with the nucleic acid sequence of CDR3 anda result of amino acid sequence translation is used as a classificationresult when there is a reference database for the D region in the step(5).<15> The method of any one of items <1>-<14>, wherein only the frequencyof appearance of the amino acid sequence is used for classification whenthere is no reference database for the D region in the step (5).<16> The method of any one of items <1>-<15>, wherein the frequency ofappearance is counted in a unit of a gene name and/or a unit of anallele.<17> The method of any one of items <1>-<16>, wherein the step (4)comprises the step of assigning the V region and the J region for theinput sequence set and extracting a CDR3 sequence, with the front ofCDR3 on a reference V region and end of CDR3 on reference J as guides.<18> The method of any one of items <1>-<17>, wherein the step (5)comprises translating the nucleic acid sequence of the CDR3 into anamino acid sequence and classifying a D region by using the amino acidsequence.<19> A system for analyzing a TCR or BCR repertoire, wherein the systemcomprises:(1) means for providing a reference database for each gene regioncomprising at least one of a V region, a D region, a J region andoptionally a C region;(2) means for providing an input sequence set which is optionallytrimmed and optionally extracted to have a suitable length;(3) means for searching for homology of the input sequence set with thereference database for the each gene region and recording an alignmentwith an approximate reference allele and/or a sequence of the referenceallele;(4) means for assigning the V region and the J region for the inputsequence set and extracting a nucleic acid sequence of the D regionbased on a result of assigning;(5) means for translating the nucleic acid sequence of the D region intoan amino acid sequence and classifying the D region by utilizing theamino acid sequence; and(6) means for calculating a frequency of appearance for each of the Vregion, the D region, and the J region and optionally the C region or afrequency of appearance of a combination thereof in the input sequenceset to derive the TCR or BCR repertoire.<19A> The system of item <19> having one or more features of any one ofitems <1>-<18>.<20> A computer program for having a computer execute processing of amethod of analyzing a TCR or BCR repertoire, the method comprising thefollowing steps:(1) providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion;(2) providing an input sequence set which is optionally trimmed andoptionally extracted to have a suitable length;(3) searching for homology of the input sequence set with the referencedatabase for the each gene region and recording an alignment with anapproximate reference allele and/or a sequence of the reference allele;(4) assigning the V region and the J region for the input sequence setand extracting a nucleic acid sequence of the D region based on a resultof assigning;(5) translating the nucleic acid sequence of the D region into an aminoacid sequence and classifying the D region by utilizing the amino acidsequence; and(6) calculating a frequency of appearance for each of the V region, theD region, and the J region and optionally the C region or a frequency ofappearance of a combination thereof in the input sequence set to derivethe TCR or BCR repertoire.<20A> The program of item <20> having one or more features of any one ofitems <1>-<18>.<21> A recording medium for storing a computer program for having acomputer execute processing of a method of analyzing a TCR or BCRrepertoire, the method comprising the following steps:(1) providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion;(2) providing an input sequence set which is optionally trimmed andoptionally extracted to have a suitable length;(3) searching for homology of the input sequence set with the referencedatabase for the each gene region and recording an alignment with anapproximate reference allele and/or a sequence of the reference allele;(4) assigning the V region and the J region for the input sequence setand extracting a nucleic acid sequence of the D region based on a resultof assigning;(5) translating the nucleic acid sequence of the D region into an aminoacid sequence and classifying the D region by utilizing the amino acidsequence; and(6) calculating a frequency of appearance for each of the V region, theD region, and the J region and optionally the C region or a frequency ofappearance of a combination thereof in the input sequence set to derivethe TCR or BCR repertoire.<21A> The recording medium of item <21> having one or more features ofany one of items <1>-<18>.

<Wet>

In another aspect, the present invention is (1) a technology foruniformly amplifying a TCR or BCR genetic sequence produced by generearrangement from a plurality of gene fragments in a genome withoutapplying a bias (unbiased gene amplification technology), and (2) atechnology for determining the base sequence of a TCR or BCR geneamplified by the unbiased gene amplification technology in a large scaleby a next generation sequencing method, assigning V, D, J, and Cregions, and analyzing a TCR repertoire or BCR repertoire.

Diverse genetic sequences are created by gene rearrangement of aplurality of gene fragments of V, D, J, and C regions on a genome forTCRs or BCRs. A technology of producing a large number of primersspecific to many V or J regions that are present and amplifying in thesame reaction solution or separate reaction solutions is widely utilizedto determine a base sequence of a TCR or BCR gene by a next generationsequencing technique. However, a difference in amplification efficiencyamong primers would be a critical issue in PCR amplification whichexponentially amplifies a small amount of gene. Further, it is necessarythat primers set to V and J regions are compatible with all knownallelic sequences. A point mutation is introduced at a high frequency(up to about 20%) in a variable section region of IgG, IgA or IgE by asomatic hypermutation mechanism for a BCR gene. Thus, if a 20 baseprimer is set, about 4 bases would have a mismatch. Hence, it isdifficult to materialize uniform gene amplification with a conventionalmethod. That is, known methods of designing a V chain specific primerbased on a genomic sequence cannot avoid a mismatch with the actual BCRgenetic sequence such that quantitative gene amplification is notguaranteed. Furthermore, a BCR has an isotype and a subclass defined bya C region sequence. It is necessary to develop a quantification methodfor each isotype or subclass utilizing a difference in the base sequenceamong isotypes or subclasses. In order to overcome the disadvantage of atechnology using a V chain specific primer current in use, the inventorshave completed a method of amplifying a TCR or BCR gene including allisotype and subtype genes with a set of primers consisting of one typeof forward primer and one type of reverse primer without changing thefrequency of presence and determining a base sequence in a large scaleby using a next generation sequencing.

Focus was placed on the genetic structure of a TCR or BCR gene. Anadaptor sequence is added, without setting a primer to highly diverse Vregions, to a 5′ terminal thereof to amplify a gene comprising all Vregions.

Such an adaptor can have any length or sequence in a base sequence.About 20 base pairs are optimal, but a sequence from 10 bases to 100bases can be used.

An adaptor added to the 3′ terminal is removed with a restrictionenzyme. In addition, all TCR or BCR genes are amplified by amplifyingwith a reverse primer specific to a C region which has a common sequencewith an adaptor primer with the same sequence as a 20 base pair adaptor.

A complementary strand DNA is synthesized with a reverse transcriptasefrom a TCR or BCR gene messenger RNA and then a double strandedcomplementary DNA is synthesized. A double stranded complementary DNAcomprising V regions with different lengths is synthesized by a reversetranscription reaction or a double strand synthesizing reaction.Adaptors consisting of 20 base pairs and 10 base pairs are added to the5′ terminal section of such genes by a DNA ligase reaction.

The genes can be amplified by setting a reverse primer in a C region ofa heavy chain of μ chain, α chain, δ chain, γ chain or ε chain or alight chain of κ chain or λ chain for BCRs and α chain, β chain, γ chainor δ chain for TCRs.

As a reverse primer set in a C region, a primer is set which matches thesequence of each of Cβ, Cα, Cγ and Cδ for TCRs and the sequence of eachof Cμ, Cα, Cδ, Cγ, Cε, Cκ, and Cλ for BCRs and has a mismatch to anextent where other C region sequences are not primed.

A reverse primer of a C region is optimally made while considering thebase sequence, base composition, DNA melting temperature (Tm), orpresence of a self-complementary sequence, such that amplification withan adaptor primer is possible.

Each BCR gene IgG subtype (γ1, γ2, γ3, and γ4) and IgA subtype (α1 andα2) can be amplified with the same primer to determine the subtype bydetermining the base sequence.

A primer can be set in a region other than the base sequence that isdifferent among allelic sequences in a C region sequence to uniformlyamplify all alleles.

A plurality of stages of nested PCR are performed in order to enhancethe specificity of an amplification reaction.

The length (number of bases) of a primer candidate sequence is notparticularly limited for a sequence not comprising a sequence that isdifferent among allelic sequences for each primer. However, the numberof bases is 10-100, preferably 15-50, and more preferably 20-30. Thus,the present invention also provides the following.

<A1> A method of preparing a sample for quantitative analysis of arepertoire of a variable region of a T cell receptor (TCR) or B cellreceptor (BCR) by genetic sequence analysis using a database, comprisingthe steps of:(1) synthesizing a complementary DNA by using an RNA sample derived froma target cell as a template;(2) synthesizing a double stranded complementary DNA by using thecomplementary DNA as a template;(3) synthesizing an adaptor-added double stranded complementary DNA byadding a common adaptor primer sequence to the double strandedcomplementary DNA;(4) performing a first PCR amplification reaction by using theadaptor-added double stranded complementary DNA, a common adaptor primerconsisting of the common adaptor primer sequence, and a first TCR or BCRC region specific primer,

wherein the first TCR or BCR C region specific primer is designed tocomprise a sequence that is sufficiently specific to a C region ofinterest of the TCR or BCR and not homologous with other geneticsequences, and comprise a mismatching base between subtypes downstreamwhen amplified;

(5) performing a second PCR amplification reaction by using a PCRamplicon of (4), the common adaptor primer, and a second TCR or BCR Cregion specific primer, wherein the second TCR or BCR C region specificprimer is designed to have a sequence that is a complete match with theTCR or BCR C region in a sequence downstream the sequence of the firstTCR or BCR C region specific primer, but comprise a sequence that is nothomologous with other genetic sequences, and comprise a mismatching basebetween subtypes downstream when amplified; and(6) performing a third PCR amplification reaction by using a PCRamplicon of (5), an added common adaptor primer in which a nucleic acidsequence of the common adaptor primer comprises a first additionaladaptor nucleic acid sequence, and an adaptor-added third TCR or BCR Cregion specific primer in which a second additional adaptor nucleic acidsequence and a molecule identification (MID Tag) sequence are added to athird TCR or BCR C region specific sequence; wherein

the third TCR or BCR C region specific primer is designed to have asequence that is a complete match with the TCR or BCR C region in asequence downstream to the sequence of the second TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified,

the first additional adaptor nucleic acid sequence is a sequencesuitable for binding to a DNA capturing bead and for an emPCR reaction,

the second additional adaptor nucleic acid sequence is a sequencesuitable for an emPCR reaction, and

the molecule identification (MID Tag) sequence is a sequence forimparting uniqueness such that an amplicon can be identified.

<A2> The method of item <A1>, wherein, for a BCR, the C region specificprimer comprises a sequence that is a complete match with an isotype Cregion of interest selected from the group consisting of IgM, IgA, IgG,IgE and IgD and is not homologous with other C regions, and is asequence that is a complete match with one of the subtypes IgG1, IgG2,IgG3 and IgG4 or one of IgA1 or IgA2 for IgA or IgG, or, for a TCR, theC region specific primer is a sequence that is a complete match with a Cregion of α chain of interest selected from the group consisting of αchain, β chain, γ chain and δ chain and is not homologous with other Cregions.<A3> The method of item <1> or <A2>, wherein a portion of a sequencethat is a complete match with all C region allelic sequences of the sameisotype in the database is selected for the C region specific primer.<A4> The method of any one of items <A1>-<A3>, wherein the commonadaptor primer is designed such that the primer is unlikely to havehomodimer and intramolecular hairpin structures and can stably form adouble strand, and designed not to be highly homologous with all TCRgenetic sequences in the database and to have the same level of meltingtemperature (Tm) as the C region specific primer.<A5> The method of item <A4>, wherein the common adaptor primer designednot to have homodimer and intramolecular hairpin structures and to havehomology with other genes comprising a BCR or TCR is selected.<A6> The method of item <A5>, wherein the common adaptor primer is P20EA(SEQ ID NO: 2) and/or P10EA (SEQ ID NO: 3).<A7> The method of any one of items <A1>-<A6>, wherein the first, secondand third TCR or BCR C region specific primers are each independently aprimer for BCR repertoire analysis, the primer being selected to be asequence that is a complete match with each isotype C region of IgM,IgG, IgA, IgD or IgE, and a complete match with subtypes for IgG andIgA, and not homologous with other sequences comprised in the database,and comprise a mismatching base between subtypes downstream of theprimer, and

wherein the common adaptor primer sequence is designed such that thesequence has a base length suitable for amplification, is unlikely tohave homodimer and intramolecular hairpin structures, and is able tostably form a double strand, and designed not to be highly homologouswith all TCR genetic sequences in the database and to have the samelevel of Tm as the C region specific primer.

<A8> The method of any one of items <A1>-<A7>, wherein the first, secondand third TCR or BCR C region specific primers are each independently aprimer for TCR or BCR repertoire analysis, each primer being selected tobe a sequence that is a complete match with 1 type of α chain (TRAC), 2types of β chains (TRBCO1 and TRBCO2), 2 types of γ chains (TRGC1 andTRGC2), and one type of δ chain (TRDC1) and is not homologous with othersequences comprised in the database, and to comprise a mismatching basebetween subtypes downstream of the primer,

wherein the common adaptor primer sequence is designed such that thesequence has a base length suitable for amplification, is unlikely tohave homodimer and intramolecular hairpin structures, and is able tostably form a double strand, and designed not to be highly homologouswith all TCR genetic sequences in the database and to have the samelevel of Tm as the C region specific primer.

<A9> The method of any one of items <A1>-<A8>, wherein the third TCR orBCR C region specific primer is set in a region that is up to about 150bases from the 5′ terminal side of a C region, and the first TCR or BCRC region specific primer and the second TCR or BCR C region specificprimer are set between the 5′ terminal side of a C region to about 300bases.<A10> The method of any one of items <A1>-<A9>, wherein the first,second and third TCR or BCR C region specific primers are eachindependently for BCR quantitative analysis,

wherein separate specific primers are set to 5 types of isotypesequences, and the primers are designed to completely match a targetsequence and ensure a mismatch of 5 bases or more for other isotypes andare designed to be a complete match with all subtypes such that one typeof primer is compatible with each similar IgG subtype (IgG1, IgG2, IgG3and IgG4) or IgA subtype (IgA1 and IgA2).

<A11> The method of any one of items <A1>-<A10>, wherein parameters inprimer design are set to: a base sequence length of 18-22 bases; amelting temperature of 54-66° C.; and % GC (% guanine.cytosine content)of 40-65%.<A12> The method of any one of items <A1>-<A11>, wherein parameters inprimer design are set to: a base sequence length of 18-22 bases; amelting temperature of 54-66° C.; and % GC (% guanine.cytosine content)of 40-65%; a self-annealing score of 26; a self-end annealing score of10; and a secondary structure score of 28.<A13> The method of any one of items <A1>-<A12>, wherein sequences ofthe first, second and third TCR or BCR C region specific primers aredetermined under the following conditions:1. a plurality of subtype sequences and/or allelic sequences areuploaded into a base sequence analysis software and aligned;2. a primer designing software is used to search for a plurality ofprimers satisfying a parametric condition in a C region;3. a primer in a region without a mismatching base in the alignedsequences in 1 is selected; and4. the presence of a plurality of mismatching sequences for each subtypeand/or allele downstream of the primer determined in 3 is confirmed, andif there is no such sequence, a primer is searched further upstream,which is further repeated as needed.<A14> The method of any one of items <A1>-<A13>, wherein the first TCRor BCR C region specific primer is set in a position at bases 41-300with a first base of a first codon of a C region sequence produced bysplicing as a baseline, the second TCR or BCR C region specific primeris set in a position at bases 21-300 with said first base as thebaseline, and the third TCR or BCR C region specific primer is set in aposition within 150 bases or less with said first base as the baseline,and the positions comprise a mismatching site in a subtype and/orallele.<A15> The method of any one of items <A1>-<A14>, wherein the first TCRor BCR C region specific primer has the following structure: CM1 (SEQ IDNO: 5), CA1 (SEQ ID NO: 8), CG1 (SEQ ID NO: 11), CD1 (SEQ ID NO: 14),CE1 (SEQ ID NO: 17), CA1 (SEQ ID NO: 35) or CB1 (SEQ ID NO: 37).<A16> The method of any one of items <A1>-<A15>, wherein the second TCRor BCR C region specific primer has the following structure: CM2 (SEQ IDNO: 6), CA2 (SEQ ID NO: 9), CG2 (SEQ ID NO: 12), CD2 (SEQ ID NO: 15),CE2 (SEQ ID NO: 18), CA2 (SEQ ID NO: 35), or CB2 (SEQ ID NO: 37).<A17> The method of any one of items <A1>-<A16>, wherein the third TCRor BCR C region specific primer has the following structure: CM3-GS (SEQID NO: 7), CA3-GS (SEQ ID NO: 10), CG3-GS (SEQ ID NO: 13), CD3-GS (SEQID NO: 16) or CE3-GS (SEQ ID NO: 19).<A18> The method of any one of items <A1>-<A17>, wherein each of the TCRor BCR C region specific primers is provided in a set compatible withall TCR or BCR subclasses.<A19> A method of performing gene analysis using a sample manufacturedby the method of any one of items <A1>-<A18>.<A20> The method of item <A19>, wherein the gene analysis is thequantitative analysis of a repertoire of a variable region of a T cellreceptor (TCR) or a B cell receptor (BCR).

<Analysis System>

<B1> A method of quantitatively analyzing a repertoire of a variableregion of a T cell receptor (TCR) or a B cell receptor (BCR) of asubject by using a database, wherein the method comprises:(1) providing a nucleic acid sample comprising a nucleic acid sequenceof the T cell receptor (TCR) or the B cell receptor (BCR) which isamplified from the subject in an unbiased manner;(2) determining the nucleic acid sequence comprised in the nucleic acidsample; and(3) calculating a frequency of appearance of each gene or a combinationthereof based on the determined nucleic acid sequence to derive a TCR orBCR repertoire of the subject.<B2> The method of item <B1>, wherein the nucleic acid sample comprisesnucleic acid sequences of a plurality of types of T cell receptors (TCR)or B cell receptors (BCR) and the step (2) determines the nucleic acidsequence by a single sequencing.<B3> The method of item <B2>, wherein the single sequencing ischaracterized in that at least one of the sequences used as a primer inamplification from the nucleic acid sample into a sample for sequencinghas the same sequence as a nucleic acid sequence encoding a C region ora complementary strand thereof.<B4> The method of item <B2> or <B3>, wherein the single sequencing ischaracterized in being performed with a common adaptor primer.<B5> The method of any one of items <B1>-<B4>, wherein the unbiasedamplification is not V region specific amplification.<B6> The method of any one of <B1>-<B5>, wherein the repertoire is therepertoire of a variable region of a BCR, and the nucleic acid sequenceis a BCR nucleic acid sequence.<B7> A method of analyzing a disease, disorder or condition of thesubject based on the TCR or BCR repertoire derived based on any one of<B1>-<B6>,<B8> The method of item <B7>, wherein the disease, disorder or conditionof the subject is selected from the group consisting of hematologicaltumor and colorectal cancer.<B9> A method of treating or preventing the disease, disorder orcondition of the subject determined by the method of item <B7> or <B8>,comprising: quantitatively associating the disease, disorder orcondition of the subject with the TCR or BCR repertoire; and selectingmeans for suitable treatment or prevention from the quantitativeassociation.<B10> The method of item <B9>, wherein the disease, disorder orcondition of the subject is selected from the group consisting ofhematological tumor and colorectal cancer.<B11> A system for quantitatively analyzing a repertoire of a variableregion of a T cell receptor (TCR) or a B cell receptor (BCR) of asubject by using a database, wherein the system comprises:(1) a kit for providing a nucleic acid sample comprising a nucleic acidsequence of the T cell receptor (TCR) or the B cell receptor (BCR) whichis amplified from the subject in an unbiased manner;(2) an apparatus for determining the nucleic acid sequence comprised inthe nucleic acid sample; and(3) an apparatus for calculating a frequency of appearance of each geneor a combination thereof based on the determined nucleic acid sequenceto derive a TCR or BCR repertoire of the subject.<B12> The system of item <B11>, wherein the nucleic acid samplecomprises nucleic acid sequences of a plurality of types of T cellreceptors (TCR) or B cell receptors (BCR) and the step (2) determinesthe nucleic acid sequence by a single sequencing.<B13> The system of item <B12>, wherein the single sequencing ischaracterized in that at least one of the sequences used as a primer inamplification from the nucleic acid sample to a sample for sequencinghas the same sequence as a C region.<B14> The system of item <B12> or <B13>, wherein the single sequencingis characterized in being performed with a common adaptor primer.<B15> The system of any one of items <B11>-<B14>, wherein the unbiasedamplification is not V region specific amplification.<B16> The system of any one of items <B11>-<B15>, wherein the repertoireis the repertoire of a variable region of a BCR, and the nucleic acidsequence is a BCR nucleic acid sequence.<B17> A system of analyzing a disease, disorder or condition of thesubject, comprising the system of any one of items <B11>-<B16> and meansfor analyzing the disease, disorder or condition of the subject based onthe TCR or BCR repertoire derived based the system.<B18> The system of item <B17>, wherein the disease disorder orcondition of the subject is selected from the group consisting ofhematological tumor and colorectal cancer.<B19> A system of treating or preventing the disease, disorder orcondition of the subject determined by the system of item <B17> or<B18>, comprising: means for quantitatively associating the disease,disorder or condition of the subject with the TCR or BCR repertoire; andmeans for selecting means for suitable treatment or prevention from thequantitative association.<B20> The system of item <B19>, wherein the disease, disorder orcondition of the subject is selected from the group consisting ofhematological tumor and colorectal cancer.<B21> A monoclonal T cell related to T cell large granular lymphocyticleukemia (T-LGL) expressing TCRα comprising TRAV10/TRAJ15/CVVRATGTALIFG(SEQ ID NO: 1450) or a nucleic acid encoding the same and/or TCRcomprising TRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500) or anucleic acid encoding the same.<B22> Use of TRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO: 1450) or a nucleicacid encoding the same in TCRα and/or TRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG(SEQ ID NO: 1500) or a nucleic acid encoding the same in TCRβ as adiagnostic indicator of T cell large granular lymphocytic leukemia(T-LGL).<B23> A method of detecting T cell large granular lymphocytic leukemia(T-LGL), comprising detecting TRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO:1450) or a nucleic acid encoding the same in TCRα and/orTRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500) or a nucleic acidencoding the same in TCRβ.<B24> An detecting agent for TRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO:1450) or a nucleic acid encoding the same in TCRα and/or a detectingagent for TRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500) or anucleic acid encoding the same in TCRβ.<B25> A diagnostic agent for T cell large granular lymphocytic leukemia(T-LGL) comprising a detecting agent for TRAV10/TRAJ15/CVVRATGTALIFG(SEQ ID NO: 1450) or a nucleic acid encoding the same in TCRα and/or adetecting agent for TRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500)or a nucleic acid encoding the same in TCRβ.<B26> A peptide which is a novel invariant TCR, comprising any one ofthe sequences set forth in SEQ ID NOs: 1627-1647.<B27> An indicator peptide of a mucosal-associated invariant T (MAIT)cell, comprising a sequence selected from the group consisting of SEQ IDNOs 1648-1651, 1653-1654, 1666-1667, 1844-1848, and 1851.<B28> A nucleic acid encoding the peptide of item <B27>.<B29> Use of the peptide of item <B27> or <B28> or a nucleic acidencoding said peptide as a diagnostic indicator of colorectal cancer.<B30> An indicator peptide of a natural killer T cell (NKT), comprisingthe sequence set forth in SEQ ID NO: 1668.<B31> A nucleic acid encoding the peptide of item <B30>.<B32> Use of the peptide of item <B30> or <B31> or a nucleic acidencoding said peptide as a diagnostic indicator of colorectal cancer.<B33> A colorectal cancer-specific peptide, comprising a sequenceselected from the group consisting of SEQ ID NOs: 1652, 1655-1665,1669-1843, 1849-1850, and 1852-1860.<B34> A nucleic acid encoding the peptide of item <B33>.<B35> Use of the peptide of item <B33> or <B34> or a nucleic acidencoding said peptide as a diagnostic indicator of colorectal cancer.<B36> A colorectal cancer specific peptide, comprising a sequenceselected from the group consisting of SEQ ID NOs: 1861-1865 and1867-1909.<B37> A nucleic acid encoding the peptide of item <B36>.<B38> Use of the peptide of item <B36> or <B37> or a nucleic acidencoding said peptide as a diagnostic indicator of colorectal cancer.<B39> A cell population inducing a T cell at a high frequency, T cellline, or recombinantly expressed T cell having the peptide of item<B33>, <B34>, <B36> or <B37> or a nucleic acid sequence encoding saidpeptide.<B40> A therapeutic agent for colorectal cancer, comprising the cellpopulation, T cell line, or T cell of item <B39>.<B41> A method of treating or preventing colorectal cancer by using thecell population, T cell line, or T cell of item <B39>.<B42> A method of detecting a usage frequency of a V gene by using themethod of any one of items <B1>-<B10> or the system of any one of items<B11>-<B20>.<B43> A method of detecting a usage frequency of a J gene by using themethod of any one of items <B1>-<B10> or the system of any one of items<B11>-<B20>.<B44> A method of detecting a usage frequency of subtype frequencyanalysis (BCR) by using the method of any one of items <B1>-<B10> or thesystem of any one of items <B11>-<B20>.<B45> A method of analyzing a pattern of CDR3 sequence lengths by usingthe method of any one of items <B1>-<B10> or the system of any one ofitems <B11>-<B20>.<B46> A method of analyzing clonality of a TCR or a BCR by using themethod of any one of items <B1>-<B10> or the system of any one of items<B11>-<B20>.<B47> A method of extracting an overlapping read by using the method ofany one of items <B1>-<B10> or the system of any one of items<B11>-<B20>.<B48> A method of searching for a disease specific TCR or BCR clone byusing the method of any one of items <B1>-<B10> or the system of any oneof <B11>-<B20>.<B49> A method of analyzing a subject with a diversity index by usingthe method of any one of items <B1>-<B10> or the system of any one ofitems <B11>-<B20>.<B50> A method of assisting analysis on a subject with a diversity indexby using the method of any one of items <B1>-<B10> or the system of anyone of items <B11>-<B20>.<B51> The method of item <B49> or <B50>, wherein the diversity index isused as an indicator for measuring a degree of recovery of an immunesystem after bone marrow transplantation or as an indicator fordetecting abnormality in an immune system cell accompanied byhematopoietic tumor.<B52> The method of item <B49> or <B50>, wherein the diversity index isselected from the group consisting of a Shannon-Wiener's diversity index(H′), Simpson's diversity index (λ, 1−λ, or 1/λ), Pielou's evennessindex (J′) and Chaol index.<B53> A method of analyzing a subject with a similarity index by usingthe method of any one of items <B1>-<B10> or the system of any one ofitems <B11>-<B20>.<B54> A method of assisting analysis on a subject with a similarly indexby using the method of any one of items <B1>-<B10> or the system of anyone of items <B11>-<B20>.<B55> The method of item <B53> or <B54>, wherein the similarity index isused as assessment of a degree of similarity of repertoires betweenmatching and mismatching HLA types, or assessment of a degree ofsimilarly of repertoires between a recipient and a donor after bonemarrow transplantation.<B56> The method of item <B53> or <B54>, wherein the similarity index isselected from the group consisting of a Morisita-Horn index, Kimoto's Cuindex, and Pianka's a index.<B57> The method of item <B1>, wherein the (1) comprises the followingsteps:(1-1) synthesizing a complementary DNA by using an RNA sample derivedfrom a target cell as a template;(1-2) synthesizing a double stranded complementary DNA by using thecomplementary DNA as a template;(1-3) synthesizing an adaptor-added double stranded complementary DNA byadding a common adaptor primer sequence to the double strandedcomplementary DNA;(1-4) performing a first PCR amplification reaction by using theadaptor-added double stranded complementary DNA, a common adaptor primerconsisting of the common adaptor primer sequence, and a first TCR or BCRC region specific primer,

wherein the first TCR or BCR C region specific primer is designed tocomprise a sequence that is sufficiently specific to a C region ofinterest of the TCR or BCR and not homologous with other geneticsequences, and comprise a mismatching base between subtypes downstreamwhen amplified;

(1-5) performing a second PCR amplification reaction by using a PCRamplicon of (1-4), the common adaptor primer, and a second TCR or BCR Cregion specific primer, wherein the second TCR or BCR C region specificprimer is designed to have a sequence that is a complete match with theTCR or BCR C region in a sequence downstream the sequence of the firstTCR or BCR C region specific primer, but comprise a sequence that is nothomologous with other genetic sequences, and comprise a mismatching basebetween subtypes downstream when amplified; and(1-6) performing a third PCR amplification reaction by using a PCRamplicon of (1-5), an added common adaptor primer in which a nucleicacid sequence of the common adaptor primer comprises a first additionaladaptor nucleic acid sequence, and an adaptor-added third TCR or BCR Cregion specific primer in which a second additional adaptor nucleic acidsequence and a molecule identification (MID Tag) sequence are added to athird TCR or BCR C region specific sequence; wherein

the third TCR or BCR C region specific primer is designed to have asequence that is a complete match with the TCR or BCR C region in asequence downstream to the sequence of the second TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified,

the first additional adaptor nucleic acid sequence is a sequencesuitable for binding to a DNA capturing bead and for an emPCR reaction,

the second additional adaptor nucleic acid sequence is a sequencesuitable for an emPCR reaction, and

the molecule identification (MID Tag) sequence is a sequence forimparting uniqueness such that an amplicon can be identified.

<B58> The system of item <B11>, wherein the (1) kit comprises thefollowing:(1-1) means for synthesizing a complementary DNA by using an RNA samplederived from a target cell as a template;(1-2) means for synthesizing a double stranded complementary DNA byusing the complementary DNA as a template;(1-3) means for synthesizing an adaptor-added double strandedcomplementary DNA by adding a common adaptor primer sequence to thedouble stranded complementary DNA;(1-4) means for performing a first PCR amplification reaction by usingthe adaptor-added double stranded complementary DNA, a common adaptorprimer consisting of the common adaptor primer sequence, and a first TCRor BCR

C region specific primer,

wherein the first TCR or BCR C region specific primer is designed tocomprise a sequence that is sufficiently specific to a C region ofinterest of the TCR or BCR and not homologous with other geneticsequences, and comprise a mismatching base between subtypes downstreamwhen amplified;

(1-5) means for performing a second PCR amplification reaction by usinga PCR amplicon of (1-4), the common adaptor primer, and a second TCR orBCR C region specific primer, wherein the second TCR or BCR C regionspecific primer is designed to have a sequence that is a complete matchwith the TCR or BCR C region in a sequence downstream the sequence ofthe first TCR or BCR C region specific primer, but comprise a sequencethat is not homologous with other genetic sequences, and comprise amismatching base between subtypes downstream when amplified; and(1-6) means for performing a third PCR amplification reaction by using aPCR amplicon of (1-5), an added common adaptor primer in which a nucleicacid sequence of the common adaptor primer comprises a first additionaladaptor nucleic acid sequence, and an adaptor-added third TCR or BCR Cregion specific primer in which a second additional adaptor nucleic acidsequence and a molecule identification (MID Tag) sequence are added to athird TCR or BCR C region specific sequence; wherein

the third TCR or BCR C region specific primer is designed to have asequence that is a complete match with the TCR or BCR C region in asequence downstream to the sequence of the second TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified,

the first additional adaptor nucleic sequence is a sequence suitable forbinding to a DNA capturing bead and for an emPCR reaction,

the second additional adaptor nucleic sequence is a sequence suitablefor an emPCR reaction, and

the molecule identification (MID Tag) sequence is a sequence forimparting uniqueness such that an amplicon can be identified.

<B58-2> The method of item <B57> or the system of item <58>, wherein,for a BCR, the C region specific primer comprises a sequence that is acomplete match with an isotype C region of interest selected from thegroup consisting of IgM, IgA, IgG, IgE and IgD and is not homologouswith other C regions, and is a sequence that is a complete match withone of the subtypes IgG1, IgG2, IgG3 and IgG4 or one of IgA1 or IgA2 forIgA or IgG, or, for a TCR, the C region specific primer is a sequencethat is a complete match with a C region of α chain of interest selectedfrom the group consisting of α chain, β chain, γ chain and δ chain andis not homologous with other C regions.<B58-3> The method of item <B57> or <B58-2> or the system of <B58> or<B58-2>, wherein a portion of a sequence that is a complete match withall C region allelic sequences of the same isotype in the database isselected for the C region specific primer.<B58-4> The method of any one of items <B57> and <B58-2>-<B58-3> or thesystem of any one of <B58>-<B58-3>, wherein the common adaptor primer isdesigned such that the primer is unlikely to have homodimer andintramolecular hairpin structures and can stably form a double strand,and designed not to be highly homologous with all TCR genetic sequencesin the database and to have the same level of melting temperature (Tm)as the C region specific primer.<B58-5> The method of any one of items <B57> and <B58-2>-<B58-4> or thesystem of any one of <B58>-<B58-4>, wherein the common adaptor primerdesigned not to have homodimer and intramolecular hairpin structures andto have homology with other genes comprising a BCR or TCR is selected.<B58-6> The method of any one of items <B57> and <B58-2>-<B58-5> or thesystem of any one of <B58>-<B58-5>, wherein the common adaptor primer isP20EA (SEQ ID NO: 2) and/or P10EA (SEQ ID NO: 3).<B58-7> The method of any one of items <B57> and <B58-2>-<B58-6> or thesystem of any one of <B58>-<B58-6>, wherein the first, second and thirdTCR or BCR C region specific primers are each independently a primer forBCR repertoire analysis, the primer being selected to be a sequence thatis a complete match with each isotype C region of IgM, IgG, IgA, IgD orIgE, and a complete match with subtypes for IgG and IgA, and nothomologous with other sequences comprised in the database, and comprisea mismatching base between subtypes downstream of the primer, and

wherein the common adaptor primer sequence is designed such that thesequence has a base length suitable for amplification, is unlikely tohave homodimer and intramolecular hairpin structures, and is able tostably form a double strand, and designed not to be highly homologouswith all TCR genetic sequences in the database and to have the samelevel of Tm as the C region specific primer.

<B58-8> The method of any one of items <B57> and <B58-2>-<B58-7> or thesystem of any one of <B58>-<B58-7>, wherein the first, second and thirdTCR or BCR C region specific primers are each independently a primer forTCR or BCR repertoire analysis, each primer being selected to be asequence that is a complete match with 1 type of α chain (TRAC), 2 typesof β chains (TRBCO1 and TRBCO2), 2 types of γ chains (TRGC1 and TRGC2),and one type of δ chain (TRDC1) and is not homologous with othersequences comprised in the database, and comprise a mismatching basebetween subtypes downstream of the primer,

wherein the common adaptor primer sequence is designed such that thesequence has a base length suitable for amplification, is unlikely tohave homodimer and intramolecular hairpin structures, and is able tostably form a double strand, and designed not to be highly homologouswith all TCR genetic sequences in the database and to have the samelevel of Tm as the C region specific primer.

<B58-9> The method of any one of items <B57> and <B58-2>-<B58-8> or thesystem of any one of <B58>-<B58-8>, wherein the third TCR or BCR Cregion specific primer is set in a region that is up to about 150 basesfrom the 5′ terminal side of a C region, and the first TCR or BCR Cregion specific primer and the second TCR or BCR C region specificprimer are set between the 5′ terminal side of a C region to about 300bases.<B58-10> The method of any one of items <B57> and <B58-2>-<B58-9> or thesystem of any one of <B58>-<B58-9>, wherein the first, second and thirdTCR or BCR C region specific primers are each independently for BCRquantitative analysis,

wherein separate specific primers are set to 5 types of isotypesequences, and the primers are designed to completely match a targetsequence and ensure a mismatch of 5 bases or more for other isotypes andare designed to be a complete match with all subtypes such that one typeof primer is compatible with each similar IgG subtype (IgG1, IgG2, IgG3and IgG4) or IgA subtype (IgA1 and IgA2).

<B58-11> The method of any one of items <B57> and <B58-2>-<B58-10> orthe system of any one of <B58>-<B58-10>, wherein parameters in primerdesign are set to: a base sequence length of 18-22 bases; a meltingtemperature of 54-66° C.; and % GC (% guanine.cytosine content) of40-65%.<B58-12> The method of any one of items <B57> and <B58-2>-<B58-11> orthe system of any one of <B58>-<B58-11>, wherein parameters in primerdesign are set to: a base sequence length of 18-22 bases; a meltingtemperature of 54-66° C.; and % GC (% guanine.cytosine content) of40-65%; a self-annealing score of 26; a self-end annealing score of 10;and a secondary structure score of 28.<B58-13> The method of any one of items <B57> and <B58-2>-<B58-12> orthe system of any one of <B58>-<B58-12>, wherein sequences of the first,second and third TCR or BCR C region specific primers are determinedunder the following conditions:1. a plurality of subtype sequences and/or allelic sequences areuploaded into a base sequence analysis software and aligned;2. a primer designing software is used to search for a plurality ofprimers satisfying a parametric condition in a C region;3. a primer in a region without a mismatching base in the alignedsequences in 1 is selected; and4. the presence of a plurality of mismatching sequences for each subtypeand/or allele downstream of the primer determined in 3 is confirmed, andif there is no such sequence, a primer is searched further upstream,which is further repeated as needed.<B58-14> The method of any one of items <B57> and <B58-2>-<B58-13> orthe system of any one of <B58>-<B58-13>, wherein the first TCR or BCR Cregion specific primer is set in a position at bases 41-300 with a firstbase of a first codon of a C region sequence produced by splicing as abaseline, the second TCR or BCR C region specific primer is set in aposition at bases 21-300 with said first base as the baseline, and thethird TCR or BCR C region specific primer is set in a position within150 bases or less with said first base as the baseline, and thepositions comprise a mismatching site in a subtype and/or allele.<B58-15> The method of any one of items <B57> and <B58-2>-<B58-14> orthe system of any one of <B58>-<B58-14>, wherein the first TCR or BCR Cregion specific primer has the following structure: CM1 (SEQ ID NO: 5),CA1 (SEQ ID NO: 8), CG1 (SEQ ID NO: 11), CD1 (SEQ ID NO: 14), CE1 (SEQID NO: 17), CA1 (SEQ ID NO: 35) or CB1 (SEQ ID NO: 37).<B58-16> The method of any one of items <B57> and <B58-2>-<B58-15> orthe system of any one of <B58>-<B58-15>, wherein the second TCR or BCR Cregion specific primer has the following structure: CM2 (SEQ ID NO: 6),CA2 (SEQ ID NO: 9), CG2 (SEQ ID NO: 12), CD2 (SEQ ID NO: 15), CE2 (SEQID NO: 18), CA2 (SEQ ID NO: 35), or CB2 (SEQ ID NO: 37).<B58-17> The method of any one of items <B57> and <B58-2>-<B58-16> orthe system of any one of <B58>-<B58-16>, wherein the third TCR or BCR Cregion specific primer has the following structure: CM3-GS (SEQ ID NO:7), CA3-GS (SEQ ID NO: 10), CG3-GS (SEQ ID NO: 13), CD3-GS (SEQ ID NO:16) or CE3-GS (SEQ ID NO: 19).<B58-18> The method of any one of items <B57> and <B58-2>-<B58-17> orthe system of any one of <B58>-<B58-17>, wherein each of the TCR or BCRC region specific primers is provided in a set compatible with all TCRor BCR subclasses.<B58-19> A method or system of performing gene analysis using a samplemanufactured by the method of any one of items <B57> and<B58-2>-<B58-18> or the system of any one of <B58>-<B58-18>.<B58-20> The method or system of item <B58-19>, wherein the geneanalysis is the quantitative analysis of a repertoire of a variableregion of a T cell receptor (TCR) or a B cell receptor (BCR).<B59> The method of any one of items <B57> and <B58-2>-<B58-20> or thesystem of any one of <B58>-<B58-20>, wherein (3) derivation of the TCRor BCR repertoire is accomplished by a method comprising the followingsteps:(3-1) providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion;(3-2) providing an input sequence set which is optionally trimmed andoptionally extracted to have a suitable length;(3-3) searching for homology of the input sequence set with thereference database for the each gene region and recording an alignmentwith an approximate reference allele and/or a sequence of the referenceallele;(3-4) assigning the V region and the J region for the input sequence setand extracting a nucleic acid sequence of the D region based on a resultof assigning;(3-5) translating the nucleic acid sequence of the D region into anamino acid sequence and classifying the D region by utilizing the aminoacid sequence; and(3-6) calculating a frequency of appearance for each of the V region,the D region, and the J region and optionally the C region or afrequency of appearance of a combination thereof based on theclassifying in (3-5) to derive the TCR or BCR repertoire.<B60> The system of any one of items <B11>-<B20>, and <B58>-<B58-20> and<B59>, wherein (3) an apparatus for deriving the TCR or BCR repertoirecomprises the following:(3-1) means for providing a reference database for each gene regioncomprising at least one of a V region, a D region, a J region andoptionally a C region;(3-2) means for providing an input sequence set which is optionallytrimmed and optionally extracted to have a suitable length;(3-3) means for searching for homology of the input sequence set withthe reference database for the each gene region and recording analignment with an approximate reference allele and/or a sequence of thereference allele;(3-4) means for assigning the V region and the J region for the inputsequence set and extracting a nucleic acid sequence of the D regionbased on a result of assigning;(3-5) means for translating the nucleic acid sequence of the D regioninto an amino acid sequence and classifying the D region by utilizingthe amino acid sequence; and(3-6) means for calculating a frequency of appearance for each of the Vregion, the D region, and the J region and optionally the C region or afrequency of appearance of a combination thereof based on theclassifying in (3-5) to derive the TCR or BCR repertoire.<B60-2> The method of any one of items <B57>, <B58-2>-<B58-20>, and<B59> or the system of any one of <B58>-<B58-20>, <B59> and <B60>,wherein the gene region comprises all of the V region, the D region, theJ region and optionally the C region.<B60-3> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2> or the system of any one of <B58>-<B58-20>, <B59> and<B60>-<B60-2>, wherein the reference database is a database with aunique ID assigned to each sequence.<B60-4> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-3> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-3>, wherein the input sequence set is an unbiasedsequence set.<B60-5> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-4> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-4>, wherein the sequence set is trimmed.<B60-6> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-5> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-5>, wherein the trimming is accomplished by the steps of:deleting low quality regions from both ends of a read; deleting a regionmatching 10 bp or more with an adaptor sequence from the both ends ofthe read; and using the read as a high quality read in analysis when aremaining length is 200 bp or more (TCR) or 300 bp or more (BCR).<B60-7> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-6> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-6>, wherein the low quality refers to a 7 bp movingaverage of QV value less than 30.<B60-8> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-7> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-7>, wherein the approximate sequence is the closestsequence.<B60-9> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-8> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-8>, wherein the approximate sequence is determined by aranking of 1. number of matching bases, 2. kernel length, 3. score, and4. alignment length.<B60-10> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-9> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-9>, wherein the homology search is conducted under acondition tolerating random mutations to be scattered throughout.<B60-11> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-10> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-10>, wherein the homology search comprises at least onecondition from (1) shortening of a window size, (2) reduction in amismatch penalty, (3) reduction in a gap penalty, and (4) a top priorityranking of an indicator is a number of matching bases, compared to adefault condition.<B60-12> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-11> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-11>, wherein the homology search is carried out under thefollowing conditions in BLAST or FASTA:

V mismatch penalty=−1, shortest alignment length=30, and shortest kernellength=15;

D word length=7 (for BLAST) or K-tup=3 (for FASTA), mismatch penalty=−1,gap penalty=0, shortest alignment length=11, and shortest kernellength=8;

J mismatch penalty=−1, shortest hit length=18, and shortest kernellength=10; and

C shortest hit length=30 and shortest kernel length=15.

<B60-13> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-12> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-12>, wherein the D region is classified by a frequency ofappearance of the amino acid sequence.<B60-14> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-13> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-13>, wherein a combination of a result of search forhomology with the nucleic acid sequence of CDR3 and a result of aminoacid sequence translation is used as a classification result when thereis a reference database for the D region in the step (5).<B60-15> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-14> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-14>, wherein only the frequency of appearance of theamino acid sequence is used for classification when there is noreference database for the D region in the step (5).<B60-16> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-15> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-15>, wherein the frequency of appearance is counted in aunit of a gene name and/or a unit of an allele<B60-17> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-16> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-16>, wherein the step (4) comprises the step of assigningthe V region and the J region for the input sequence set and extractinga CDR3 sequence, with the front of CDR3 on a reference V region and endof CDR3 on reference J as guides.<B60-18> The method of any one of items <B57>, <B58-2>-<B58-20>, <B59>,and <B60-2>-<B60-17> or the system of any one of <B58>-<B58-20>, <B59>and <B60>-<B60-17>, wherein the step (5) comprises translating thenucleic acid sequence of the CDR3 into an amino acid sequence andclassifying a D region by using the amino acid sequence.<B60-19> The system of any one of items <B11>-<B20>, <B58>-<B58-20>,<B59> and <B60>-<B60-18>, wherein (3) an apparatus for deriving the TCRor BCR repertoire comprises:(3-1) means for providing a reference database for each gene regioncomprising at least one of a V region, a D region, a J region andoptionally a C region;(3-2) means for providing an input sequence set which is optionallytrimmed and optionally extracted to have a suitable length;(3-3) means for searching for homology of the input sequence set withthe reference database for the each gene region and recording analignment with an approximate reference allele and/or a sequence of thereference allele;(3-4) means for assigning the V region and the J region for the inputsequence set and extracting a nucleic acid sequence of the D regionbased on a result of assigning;(3-5) means for translating the nucleic acid sequence of the D regioninto an amino acid sequence and classifying the D region by utilizingthe amino acid sequence; and(3-6) means for calculating a frequency of appearance for each of the Vregion, the D region, and the J region and optionally the C region or afrequency of appearance of a combination thereof in the input sequenceset to derive the TCR or BCR repertoire.<B60-20> The system of any one of items <B11>-<B20>, <B58>-<B58-20>,<B59> and <B60>-<B60-19>, wherein processing of a method of analyzingthe TCR or BCR repertoire is materialized by a computer program forhaving a computer execute the processing comprising the following steps:(1) providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion;(2) providing an input sequence set which is optionally trimmed andoptionally extracted to have a suitable length;(3) searching for homology of the input sequence set with the referencedatabase for the each gene region and recording an alignment with anapproximate reference allele and/or a sequence of the reference allele;(4) assigning the V region and the J region for the input sequence setand extracting a nucleic acid sequence of the D region based on a resultof assigning;(5) translating the nucleic acid sequence of the D region into an aminoacid sequence and classifying the D region by utilizing the amino acidsequence; and(6) calculating a frequency of appearance for each of the V region, theD region, and the J region and optionally the C region or a frequency ofappearance of a combination thereof in the input sequence set to derivethe TCR or BCR repertoire.<B60-21> The system of any one of items <B11>-<B20>, <B58>-<B58-20>,<B59> and <B60>-<B60-20> for having a computer execute processing of amethod of analyzing a TCR or BCR repertoire, the method comprising thefollowing steps:(1) providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion;(2) providing an input sequence set which is optionally trimmed andoptionally extracted to have a suitable length;(3) searching for homology of the input sequence set with the referencedatabase for the each gene region and recording an alignment with anapproximate reference allele and/or a sequence of the reference allele;(4) assigning the V region and the J region for the input sequence setand extracting a nucleic acid sequence of the D region based on a resultof assigning;(5) translating the nucleic acid sequence of the D region into an aminoacid sequence and classifying the D region by utilizing the amino acidsequence; and(6) calculating a frequency of appearance for each of the V region, theD region, and the J region and optionally the C region or a frequency ofappearance of a combination thereof in the input sequence set to derivethe TCR or BCR repertoire.

<Examples of Application in Analysis>

<C1>

A method of applying a cancer idiotype peptide sensitization immune celltherapeutic method to a subject, the method comprising:

(1) analyzing a T cell receptor (TCR) or B cell receptor (BCR)repertoire of the subject by the method of any one of items <B1>-<B10>,<B57>, <B58-2>-<B58-20>, <B59>, and <B60>-<B60-21> or the system of anyone of items <B11>-<B20>, <B58>-<B58-20>, <B59> and <B60>-<B60-21>;(2) determining a TCR or BCR derived from a cancer cell of the subjectbased on a result of the analysis, wherein the determining is done byselecting a high ranking sequence in a frequency of presence ranking ofa TCR or BCR gene derived from the cancer cell of the subject as the TCRor BCR derived from the cancer cell;(3) determining an amino acid sequence of a candidate HLA test peptidebased on the determined TCR or BCR derived from cancer, wherein thedetermining is performed based on a score calculated by using an HLAbinding peptide prediction algorithm;(4) synthesizing the determined peptide; and optionally(5) administering therapy by using the synthesized peptide.

<C2>

The method of item <C1>, wherein the candidate HLA test peptide of thestep (3) is determined by using BIMAS, SYFPEITHI, RANKPEP or NetMHC.

<C3> <Improved CTL Method>

The method of item <C1> or <C2>, wherein the method comprises, after thestep (4), the steps of: mixing the peptide, an antigen presenting cellor a dendritic cell derived from the subject, and a CD8⁺ T cell derivedfrom the subject and culturing the mixture; and administering themixture after culturing to a patient.

<C4> <DC Vaccination Therapeutic Method>

The method of any one of items <C1>-<C3> comprising, after the step (4),the steps of: mixing the peptide with the dendritic cell derived fromthe subject and culturing the mixture; and administering the culturedmixture to a patient.

<C5> <Patient Autoimmune Cell Therapeutic Method>

The method of any one of items <C1>-<C4>, wherein the method comprises,after the step (4), the steps of: mixing the peptide, the antigenpresenting cell or the dendritic cell derived from the subject and aCD8⁺ T cell derived from the subject and culturing the mixture toproduce a CD8⁺ T cell-dendritic cell/antigen presenting cell-peptidemixture; mixing the peptide with the dendritic cell derived from thesubject and culturing the mixture to produce a dendritic cell-peptidemixture; and administering the CD8⁺ T cell-dendritic cell/antigenpresenting cell-peptide mixture and the dendritic cell-peptide mixtureto a patient.

<D1> <Isolation of Tailor-Made Cancer Specific T Cell Receptor Gene,Isolation of Cancer Specific TCR Gene by In Vitro Antigen Stimulation>

A method of isolating a cancer specific TCR gene by an in vitro antigenstimulation, comprising:

(A) mixing an antigen peptide or antigen protein derived from a subjector the determined peptide of any one of items <C1>-<C5> or a lymphocytederived from the subject, an inactivated cancer cell derived from thesubject, and a T lymphocyte derived from the subject and culturing themixture to produce a tumor specific T cell;(B) analyzing a TCR of the tumor specific T cell by the method of anyone of items <B1>-<B10>, <B57>, <B58>-<B58-20>, <B59>, and<B60>-<B60-21> and/or the system of any one of items <B11>-<B20>,<B58>-<B58-20>, <B59> and <B60>-<B60-21>; and(C) isolating a desired tumor specific T cell based on a result of theanalyzing.

<D1-1>

The method of item <D1>, wherein step (A) is a step of mixing theinactivated cancer cell derived from the subject and the antigen peptideor antigen protein derived from the subject with the T lymphocytederived from the subject and culturing the mixture to produce a tumorspecific T cell.

<D1-2>

The method of any one of items <D1>-<D1-1>, wherein the step (A) is astep of mixing the lymphocyte derived from the subject, the inactivatedcancer cell derived from the subject, and the T lymphocyte derived fromthe subject and culturing the mixture to produce a tumor specific Tcell.

<D1-3>

The method of any one of items <D1>-<D1-2>, wherein the step (A) is astep of mixing the determined peptide of item C1, the inactivated cancercell derived from the subject, and the T lymphocyte derived from thesubject and culturing the mixture to produce a tumor specific T cell.

<D2> <Isolation of Tailor-Made Cancer Specific T Cell Receptor Gene,Isolation of Cancer Specific TCR Gene by Searching for a CommonSequence>

A method of isolating a cancer specific TCR gene by searching for acommon sequence, comprising:

(A) isolating a lymphocyte or cancer tissue from subjects having acommon HLA;(B) analyzing a TCR of the tumor specific T cell by the method of itemB1 for the lymphocyte or cancer tissue; and(C) isolating a T cell having a sequence in common with the tumorspecific T cell.

<E1> <CPC>

A cell processing therapeutic method, comprising:

A) collecting a T lymphocyte from a patient;B) analyzing TCRs based on the method of any one of items <B1>-<B10>,<B57>, <B58>-<B58-20>, <B59>, and <B60>-<B60-21> and/or the system ofany one of items <B11>-<B20>, <B58>-<B58-20>, <B59> and <B60>-<B60-21>after applying antigen stimulation to the T lymphocyte, wherein theantigen stimulation is applied by an antigen peptide or antigen proteinderived from the subject, an inactivated cancer cell derived from thesubject, or an idiotype peptide derived from tumor;C) selecting an optimal TCR and an optimal antigen in the analyzed TCRs;D) producing a tumor specific α and β TCR expression viral vector of aTCR gene of the optimal TCR; andE) introducing the T lymphocyte introduced with a tumor specific TCRgene into the patient.

<E1-1>

The cell processing therapeutic method of item <E1>, wherein the antigenstimulation is applied with the antigen peptide or antigen proteinderived from the subject.

<E1-2>

The cell processing therapeutic method of item <E1> or <E1-1>, whereinthe antigen stimulation is applied with the inactivated cancer cellderived from the subject.

<E1-3>

The cell processing therapeutic method of any one of items <E1> and<E1-1>-<E1-2>, wherein the antigen stimulation is applied with theidiotype peptide derived from tumor.

<E1-4>

The method of any one of items <E1> and <E1-1>-<E1-3>, wherein the stepC) comprises selecting an antigen that is highly expressed in cancertissue of the subject.

<E1-5>

The method of any one of items <E1> and <E1-1>-<E1-4>, wherein the stepC) comprises selecting an antigen which most strongly activates a T cellin an antigen specific lymphocyte stimulation test.

<E1-6>

The method of any one of items <E1> and <E1-1>-<E1-5>, wherein the stepC) comprises selecting an antigen that increases a frequency of aspecific TCR the most from repertoire analysis conducted based on themethod of any one of items <B1>-<B10>, <B57>, <B58>-<B58-20>, <B59>, and<B60>-<B60-21> and/or the system of any one of items <B11>-<B20>,<B58>-<B58-20>, <B59> and <B60>-<B60-21> before and after applying theantigen stimulation.

<E2> <RAC of CPC>

A method of assessing efficacy and/or safety by a stimulation test invitro by using a cancer specific TCR gene isolated by the method of item<D2>.

<CC1>

A method of preparing a composition for use in a cancer idiotype peptidesensitization immune cell therapeutic method to a subject, the methodcomprising:

(1) analyzing a T cell receptor (TCR) or B cell receptor (BCR)repertoire of the subject by the method of any one of items <B1>-<B10>,<B57>, <B58>-<B58-20>, <B59>, and <B60>-<B60-21> and/or the system ofany one of items <B11>-<B20>, <B58>-<B58-20>, <B59> and <B60>-<B60-21>;(2) determining a TCR or BCR derived from a cancer cell of the subjectbased on a result of the analysis, wherein the determining is done byselecting a high ranking sequence in a frequency of presence ranking ofa TCR or BCR gene derived from the cancer cell of the subject as the TCRor BCR derived from the cancer cell;(3) determining an amino acid sequence of a candidate HLA test peptidebased on the determined TCR or BCR derived from cancer, wherein thedetermining is performed based on a score calculated by using an HLAbinding peptide prediction algorithm; and(4) synthesizing the determined peptide.

<CC2>

The method of item <CC1>, wherein the candidate HLA test peptide of thestep (3) is determined by using BIMAS, SYFPEITHI, RANKPEP or NetMHC.

<CC3> <Improved CTL Method>

The method of item <CC1> or <CC2>, wherein the method comprises, afterthe step (4), the step of: mixing the peptide, an antigen presentingcell or a dendritic cell derived from the subject, and a CD8⁺ T cellderived from the subject and culturing the mixture.

<CC4> <DC Vaccination Therapeutic Method>

The method of any one of items <CC1>-<CC2> comprising, after the step(4), the step of: mixing the peptide with a dendritic cell derived fromthe subject and culturing the mixture.

<CC5> <Patient Autoimmune Cell Therapeutic Method>

The method of any one of items <CC1>-<CC4>, wherein the methodcomprises, after the step (4), the steps of: mixing the peptide, theantigen presenting cell or the dendritic cell derived from the subjectand a CD8⁺ T cell derived from the subject and culturing the mixture toproduce a CD8⁺ T cell-dendritic cell/antigen presenting cell-peptidemixture; and mixing the peptide with the dendritic cell derived from thesubject and culturing the mixture to produce a dendritic cell-peptidemixture.

<DD1> <Isolation of Tailor-Made Cancer Specific T Cell Receptor Gene,Isolation of Cancer Specific TCR Gene by In Vitro Antigen Stimulation>

A method of preparing an isolated cancer specific TCR gene by an invitro antigen stimulation, comprising:

(A) mixing an antigen peptide or antigen protein derived from a subjector the determined peptide of any one of items <CC1>-<CC5> or alymphocyte derived from the subject, an inactivated cancer cell derivedfrom the subject, and a T lymphocyte derived from the subject andculturing the mixture to produce a tumor specific T cell;(B) analyzing a TCR of the tumor specific T cell by the method of anyone of items <B1>-<B10>, <B57>, <B58>-<B58-20>, <B59>, and<B60>-<B60-21> and/or the system of any one of items <B11>-<B20>,<B58>-<B58-20>, <B59> and <B60>-<B60-21>; and(CC) isolating a desired tumor specific T cell based on a result of theanalyzing.

<DD1-1>

The method of item <DD1>, wherein the step (A) is a step of mixing theinactivated cancer cell derived from the subject and the antigen peptideor antigen protein derived from the subject with the T lymphocytederived from the subject and culturing the mixture to produce a tumorspecific T cell.

<DD1-2>

The method of item <DD1> or <DD1-1>, wherein the step (A) is a step ofmixing the lymphocyte derived from the subject, the inactivated cancercell derived from the subject, and the T lymphocyte derived from thesubject and culturing the mixture to produce a tumor specific T cell.

<DD1-3>

The method of any one of items <DD1>-<DD1-2>, wherein the step (A) is astep of mixing the determined peptide of item CC1, the inactivatedcancer cell derived from the subject, and the T lymphocyte derived fromthe subject and culturing the mixture to produce a tumor specific Tcell.

<DD2> <Isolation of Tailor-Made Cancer Specific T Cell Receptor Gene,Isolation of Cancer Specific TCR Gene by Searching for a CommonSequence>

A method of preparing an isolated cancer specific TCR gene by searchingfor a common sequence, comprising:

(A) providing a lymphocyte or cancer tissue isolated from subjectshaving a common HLA;(B) analyzing a TCR of the tumor specific T cell by the method of anyone of items <B1>-<B10>, <B57>, <B58>-<B58-20>, <B59>, and<B60>-<B60-21> and/or the system of any one of items <B11>-<B20>,<B58>-<B58-20>, <B59> and <B60>-<B60-21> for the lymphocyte or cancertissue; and(C) isolating a T cell having a sequence in common with the tumorspecific T cell.

<EE1> <CCPCC>

A method of preparing a T lymphocyte introduced with a tumor specificTCR gene for use in a cell processing therapeutic method, comprising:

A) providing a T lymphocyte collected from a patient;B) analyzing TCCRs based on the method of any one of items <B1>-<B10>,<B57>, <B58>-<B58-20>, <B59>, and <B60>-<B60-21> and/or the system ofany one of items <B11>-<B20>, <B58>-<B58-20>, <B59> and <B60>-<B60-21>after applying an antigen stimulation to the T lymphocyte, wherein theantigen stimulation is applied by an antigen peptide or antigen proteinderived from the subject, an inactivated cancer cell derived from thesubject, or an idiotype peptide derived from tumor;CC) selecting an optimal TCR and an optimal antigen in the analyzedTCRs; andDD) producing a tumor specific α and β TCR expression viral vector of aTCCR gene of the optimal TCR.

<EE1-1>

The method of item <EE1>, wherein the antigen stimulation is appliedwith the antigen peptide or antigen protein derived from the subject.

<EE1-2>

The method of item <EE1> or <EE1-1>, wherein the antigen stimulation isapplied with the inactivated cancer cell derived from the subject.

<EE1-3>

The method of any one of items <EE1>-<EE1-2>, wherein the antigenstimulation is applied with the idiotype peptide derived from tumor.

<EE1-4>

The method of any one of items <EE1>-<EE1-3>, wherein the step C)comprises selecting an antigen that is highly expressed in cancer tissueof the subject.

<EE1-5>

The method of any one of items <EE1>-<EE1-4>, wherein the step C)comprises selecting an antigen which most strongly activates a T cell inan antigen specific lymphocyte stimulation test.

<EE1-6>

The method of any one of items <EE1>-<EE1-5>, wherein the step C)comprises selecting an antigen that increases a frequency of a specificTCCR the most from repertoire analysis conducted based on item <B1>before and after applying the antigen stimulation.

<EE2> <RACC of CCPCC>

A method of assessing efficacy and/or safety by a stimulation test invitro by using a cancer specific TCCR gene isolated by the method ofitem <DD2>.

The specific steps of the efficacy and/or safety assessment isexemplified below.<Efficacy> For instance, efficacy can be assessed, after culturing a Tcell introduced with a cancer specific TCR gene with the antigen peptideor antigen protein derived from the subject of <EE1-1>, inactivatedcancer cell derived from the subject of <EE1-2>, or idiotype peptidederived from tumor of <EE1-3>, by measuring the amount of cytokines(interferon γ or the like) secreted to the outside of a cell in responseto T cell activation, by measuring the amount of expression of aspecific gene that is elevated in response to T cell activation, or bymeasuring a cell surface molecule that is expressed or undergoesincreased expression in response to T cell activation.<Safety> For instance, safety can be assessed, when a T cell derivedfrom the subject introduced with a cancer specific TCR gene is mixedwith a normal cell derived from the subject, by measuring theabove-described cytokines secreted, gene expression, or expression of acell surface molecule in response to T cell activation and confirmingthat the T cell transgenically introduced with a TCR is not activated bya normal cell.

It is understood that the present invention can further be provided as acombination of one or more of the aforementioned features in addition tothe explicitly shown combinations. Further embodiments and advantages ofthe present invention are recognized by those skilled in the art byreading and understanding the following Detailed Description as needed.

Advantageous Effects of Invention

The present invention has an effect of being capable of handling a“large scale” sequence relative to conventional techniques. The presentinvention is considered to have an especially advantageous effect interms of being able to, regardless of a mutation, amplify in an“unbiased” manner and make an accurate determination for especiallyBCRs, as numerous mutations are observed. The present invention isconsidered 1. unbiased, and 2. therefore has excellent quantifiabilitywith respect to amplification methods and sequencing methods utilizing aV chain specific primer among conventional systems. The presentinvention is also advantageous with respect to techniques such as SMARTPCR in terms of 1. significantly improved “level of unbiasedness” and 2.lack of the unique disadvantages of each technique. For instance, anissue of Repeated Template Switching is reported for SMART. However, thepresent system does not have such an issue. Further, other advantageouseffects include 3. the capability of comprehensive analysis, includingidentification of isotypes and subtypes.

The system and method of the present invention can derive TCR and BCRrepertoires of α, β, γ, and δ chains for TCRs and IgM, IgD, IgA, IgG,and IgE heavy chains and IgK and IgL light chains for BCRs and detect achange in the repertoires from various aspects. A C region primer for asequence is arranged at a suitable position in order to accuratelydetermine a CDR3 region base sequence that is important in identifying adisease specific TCR or BCR. Furthermore, a primer position is devisedsuch that the type of isotype or subtype can be identified and a geneassociated with a disease is readily identified.

All conventional techniques employed A plurality of PCR using numerous Vchain specific primers and had a significant issue in quantification orprecision. However, such an issue was resolved. Further, use of theanalysis system of the present invention also accomplishes thefollowing. For instance, the analysis system can screen for invariantTCRs. It was discovered that invariant TCRs can be screened because aread overlapping in numerous samples is searched regardless of HLA in aTCRα chain in TCR repertoire analysis for a large scale base sequence.In fact, it was possible to detect numerous TCRs derived from MAITrecognizing MR1, which is a non-classical MHC. It is known that NKT,MAIT or the like expressing an invariant TCR serves an important role inimmune responses such as infection immunity, antitumor or inflammation.It is expected that a novel invariant TCR can be screened for in varioustissue samples and utilized to find a cell with a unique function.

Furthermore, a TCRα and TCRβ gene pair of an antigen specific TCR can beestimated. TCRα and TCRβ are receptor molecules forming a heterodimer.An antigen specific T cell that proliferates in response to an antigenconsists of specific unique TCRα and TCRβ chains. However, since TCRrepertoire analysis amplifies TCRα and TCRβ genes separately, it is notpossible to known which TCRα and which TCRβ form a pair. In this regard,it is possible to estimate paired TCRα and TCRβ chain genes by examiningwhether a combination of individuals with an overlap in a specific TCRβchain read matches with individuals with an overlap in a TCRα chain(FIG. 44). It was possible to estimate a matching TCRα chain by usingindividuals with an overlap in a specific TCR chain as an indicator(Table 3-11). Although there are cases where this is assigned to aplurality of reads, it is considered to be a searching method that isuseful in identifying paired TCR genes.

It is especially useful in clinical applications where a sample forhighly precise, unbiased, large scale gene analysis is provided andquantitative analysis is especially required. Further, the presentinvention can identify a “low frequency” ( 1/10,000- 1/100,000 or lower)gene, leading to a more accurate diagnosis or therapy of leukemia or thelike. This was not possible with conventional techniques (method ofcombining plating with an adaptor or method of combining plating withthe SMART method) due to the detection limit (about 1%).

Further, a V specific technology has low quantifiability due to varyingamplification efficiency among V specific primers. However, thistechnology performs amplification with one set of primers, thus enablinghighly precise quantification in the truest sense.

Further, since all TCRs or BCRs can be amplified with one set ofprimers, primers and containers required for amplification can bereduced to cut expenses.

Further, BCRs are characterized by having a mutation. Thus, a methodusing a V chain specific primer has disadvantages such as essentiallybeing unable to perform amplification, or producing a gene with reducedamplification efficiency or the like. Meanwhile, the method of thepresent invention can also solve problems in BCRs.

Further, the analysis method using the present invention is advantageousin that it can complete the method in several minutes while conventionaltechniques complete overnight.

<Wet Associated Effect>

The present invention is especially useful in clinical applicationswhere quantitative analysis is especially required and a sample isprovided for highly precise, unbiased, large scale gene analysis.Further, the present invention can identify a “low frequency” ( 1/10000-1/100000 or lower) gene, leading to a more accurate diagnosis or therapyof leukemia or the like. This was not possible with conventionaltechniques (method of combining plating with an adaptor or method ofcombining plating with the SMART method) due to the detection limit(about 1%).

Further, a V specific technology has low quantifiability due to varyingamplification efficiency among V specific primers. However, thistechnology performs amplification with one set of primers, thus enablinghighly precise quantification in the truest sense.

Further, since all TCRs or BCRs can be amplified with one set ofprimers, primers and containers required for amplification can bereduced to cut expenses.

Further, BCRs are characterized by having a mutation. Thus, a methodusing a V chain specific primer has disadvantages such as essentiallybeing unable to perform amplification, or producing a gene with reducedamplification efficiency or the like. Meanwhile, the method of thepresent invention can also solve problems in BCRs.

Further, the analysis method using the present invention is advantageousin that it can complete the method in several minutes while conventionaltechniques complete overnight.

<In Silico Associated Effect>

Significant differences from conventional and commonly usedIMGT/High-V-QUEST include the following: IMGT/High-V-QUEST does not havea function for classifying a C region, and repertoire classification iseither “unit of gene name” or “unit of allele” (i.e., (*) V (genename)—D (gene name)—J (gene name) or V (allele)—D (allele)—J (allele)).Further, CDR3 classification is possible when performed separately fromthe above-described repertoire, but has no degree of freedom. On theother hand, the analysis method of the present invention can classify aC region and select “unit of gene name” or “unit of allele” for eachregion in repertoire classification. Further, CDR3 can also be usedinstead of D.

Further, in addition to the classification method of IMGT/High-V-QUEST,the present invention can also use combinations such as V (gene name)—D(allele)—J (allele), V (allele)—CDR3-J (allele) or the like. CDR3 can beused as a part of the above-described repertoire classification, or canalso be classified individually. Further, the maximum number ofsequences that can be processed in one batch is 150,000 inIMGT/High-V-Quest, while it is unlimited in the analysis method of thepresent invention. The time required for processing the same data isapproximately 1/10 in the present system.

<Effects Regarding Therapy>

The cancer idiotype peptide therapeutic method of the present inventionis effective for patients when there is no specific marker (moleculartarget) effective for therapy in a target cancer cell or when there isno effect with therapy by an existing specific molecule targeting agent.That is, since a peptide is made based on genetic information of acancer cell derived from an individual patient, an effect is exhibitedon many tumors expressing a TCR or BCR. Lymphoma cells and leukemiacells, depending on their origins, have T cell based tumors and B cellbased tumors. The present technique is applicable to each tumor form anduseful in therapy of many patients. Further, when a B cell subpopulationdeveloped into tumor is targeted, an antibody drug is used which targetsa cell surface molecule expressed on a majority of B cells such asanti-CD20 antibodies. Such antibody drugs also act on normal B cells.Thus, such drugs act not only on cancer cells, but also on normal cellsto induce a side effect such as decrease of immunological capability.Meanwhile, therapy targeting only cancer cells as in the presentinvention is highly safe. When a cancer peptide is used, highly safetherapy can be materialized by using a more highly specific peptideagainst cancer cells. Further, existing therapy using a cancer peptideis limited to patients with a specific HLA to which the peptide binds.Meanwhile, a peptide is designed based on genetic information of apatient as in the present invention. Thus, such a peptide isadvantageous in being not limited by an HLA type and adaptable to a widerange of patients.

An existing CTL therapeutic method cocultures a lymphocyte of a patientwith a tumor cell of the patient and an existing DC therapeutic methodcocultures a DC cell of a patient with a tumor cell of the patient toinduce a tumor specific killer T cell or a tumor specific DC. Inaddition, there is therapy using an artificial cancer antigen tostimulate a lymphocyte or DC cell and introduce the antigen into apatient to get an antitumor effect. As an antigen imparting specificity,use of a cancer antigen protein in comparison to the entire tumor cell,and a peptide in comparison to a protein is considered to be moreeffective and have fewer side effects. Unlike a protein, a peptide isadvantageous in that a peptide can be chemically synthesized readily anddirectly based on genetic sequence information. Safety can be ensuredsince a peptide does not use biomaterials such as cell, medium, orinfectious substance in the manufacturing process thereof. Safe therapyadapted to a wide range of patients can be materialized by designingindividual peptides compatible with an HLA of a patient based on thegenetic information of a cancer cell.

A synergistic effect is expected from introducing a tumor specific DCcell and CTL cell in a patient autoimmune therapeutic method. A CTL cellis expected to act as a cell already stimulated and activated by anantigen and exert an early therapeutic effect. Since a tumor specific DCcell induces a CTL cell in a patient introduced therewith, there is asustained antitumor effect. Thus, a synergistic antitumor effect isexpected from combined use of such different cells.

In cancer specific TCR gene therapy, it is important that expression ofa target antigen is limited to cancer cells. An antigen localized inlimited tissue such as cancer cells and testicular tissue, as incancer-testicular antigens, is selected in therapy. However, it is knownthat such antigens are also expressed in some normal cells, which may bea safety related issue in therapy in some cases. Tailor-made cancer TCRgene therapy of the technique of the present invention identifies a Tcell that infiltrates a patient's tumor tissue and utilizes a geneticsequence of a TCR thereof. Thus, a functional TCR considered to actuallyhave antitumor action in a patient's body is utilized. Hence, a higherlevel of effect is expected. Further, since it is a T cell in apatient's body, it is highly likely that the action on normal cells islimited. Existing TCR gene therapy is limited to patients having aspecific HLA and expressing a target cancer antigen. On the other hand,tailor-made therapy can make a TCR individually that is specific to acancer antigen derived from a patient and compatible with a patient'sHLA, such that therapy targeting a wider range of patients would bepossible. Isolation of a cancer specific TCR gene with an in vitrostimulation is performed by stimulating a lymphocyte of a patient withan antigen protein, antigen peptide, inactivated cancer cell, idiotypepeptide or the like. A TCR gene isolated via an experimental process foreach patient is a TCR adapted to the HLA type of a patient, cancer cellform, cancer antigen species or other genetic background and isconsidered to be more effective in therapy.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows cross-reactivity of an isotype specific primer. The leftpanel is an example related to a second IgM sample. The left end (L)shows a lane for a molecular weight marker. M, G, A, D, and E showresults with IgM, IgG, IgA, IgD and IgE specific primers, respectively.The middle panel shows a result with a second IgG sample on the leftside and a result with a second IgA sample on the right side. The rightend (L) shows a lane for a molecular weight marker. M, G, A, D and Eshow results with IgM, IgG, IgA, IgD and IgE specific primers,respectively. The right panel shows a second IgD sample on the left sideand a second IgE sample on the right side. The left end (L) shows a lanefor a molecular weight marker. M, G, A, D and E show results with IgM,IgG, IgA, IgD and IgE specific primers, respectively. To assess thespecificity of an immunoglobulin isotype specific primer that was used,amplification was performed with an immunoglobulin isotype specificprimer of interest and another isotype specific primer to verify thepresence of cross-reactivity. 10 μL of GS-PCR amplicon, afterelectrophoresis in a TAE buffer with 2% agarose gel, was assessed withethidium bromide staining. A 2^(nd) PCR amplicon amplified with eachisotype specific primer was not amplified with another isotype specificGS-PCR primer, verifying that the primers are highly specific.

FIG. 2 shows results of studying the optimal dilution concentration. AGS-PCR optimal condition for each isotype was studied. 2-fold serialdilution system of a 2^(nd) PCR amplicon was created to perform 20cycles of GS-PCR. The results are shown, from the left, for 1, 2, 4, 8,and 16-fold dilutions of IgM, IgG, IgA, IgD, and IgE for the 2^(nd) PCRamplicon. L on the left end shows a lane for a molecular weight marker.Excellent results were obtained for 16-fold dilution.

FIG. 3 shows results of studying the optimal number of cycles. 16-folddilution 2^(nd) PCR amplicons were used for 10, 15, and 20 cycle PCR.The top panel shows the results for 20 cycles, the middle panel showsthe results for 15 cycles, and the bottom panel shows the results for 10cycles. Each panel shows L at the left end, indicating a lane formolecular weight markers, and shows, from the left, IgM, IgG, IgA, IgDand IgE. For IgM, IgG, IgA, and IgD, excellent amplification wasconfirmed with 10 cycles. Further, it was confirmed that 20 cycles wereappropriate for IgE.

FIG. 4 shows the read length from next generation sequencing. The graphshows the number of library reads (vertical axis) and the horizontalaxis indicates the result of analyzing the read length. The read lengthsfrom next generation sequencing of a BCR gene are shown. The number ofreads in Raw data was 130000, and more than 90000 reads that have gonethrough Filter pass were obtained. Table 2 shows the number of readsfrom each isotype that was labeled with a Tag.

FIG. 5 shows results of analyzing read lengths for each MID. The toppanel shows, from the left, IgM, IgG and IgA. The bottom panel shows,from the left, IgD and IgE. In each graph, the vertical axis indicatesthe number of reads and the horizontal axis indicates the read length(base length). The distributions of read length and number of readsdivided into each MID were equal. When counted while setting the readlength sufficient for analyzing a V region as 400 bp or greater, half ofthe reads, about 10000 reads, were considered effective for BCRrepertoire analysis.

FIG. 6A shows results of analyzing usage frequency of C region sequencesfor each isotype. The top panel shows, from the left, IgM, IgG and IgA.The bottom panel shows, from the left, IgD and IgE. In each graph, thevertical axis indicates %, and the horizontal axis indicates theidentified C region gene name. Search for homology with a C regionsequence of an immunoglobulin isotype including subclasses was performedon the obtained reads for each isotype. The frequency of number of readsfor each subclass was 73% for IgA1 and 27% for IgA2 in the IgA subclass,62% for IgG1 and 36% for IgG2, while hardly any reads were obtained forIgG3 or IgG4 in the IgG subclass. Further, since obtained reads for eachsubclass were rarely classified into other classes, primer specificitywas reconfirmed at the sequence level. FIG. 6A shows analysis withHighV-Quest of IMGT.

FIG. 6B shows results of analysis similar to FIG. 6A with an improvedsoftware (Repertoire genesis). Similar results were also obtained withthis software. Furthermore, it was also possible to obtain a result ofno hit, which indicates a read that is not classified in any isotype orsubtype.

FIGS. 7A and 7B show results of analyzing a V region repertoire for eachisotype. Each of IgM, IgG, IgA, IgD and IgE is shown from the top. Thehorizontal axis indicates the name of each isotype. A repertoire of a Vregion sequence for each isotype (BCR V repertoire) is shown. BCR Vrepertoires were very similar among IgM, IgG, IgA, and IgD, but only aread having IGHV3-30 was obtained for IgE. A reason therefor issuggested to be the possibility that there are much fewer number of IgEpositive cells in the peripheral blood relative to other classes andtherefore a biased repertoire was detected. FIGS. 7A and 7B showanalysis with HighV-Quest of IMGT.

FIGS. 7A and 7B show results of analyzing a V region repertoire for eachisotype. Each of IgM, IgG, IgA, IgD and IgE is shown from the top. Thehorizontal axis indicates the name of each isotype. A repertoire of a Vregion sequence for each isotype (BCR V repertoire) is shown. BCR Vrepertoires were very similar among IgM, IgG, IgA, and IgD, but only aread having IGHV3-30 was obtained for IgE. A reason therefor issuggested to be the possibility that there are much fewer number of IgEpositive cells in the peripheral blood relative to other classes andtherefore a biased repertoire was detected. FIGS. 7A and 7B showanalysis with HighV-Quest of IMGT.

FIGS. 7C and 7D show results of analysis similar to FIGS. 7A and 7B withan improved software (Repertoire genesis). Similar results were alsoobtained with this software. Furthermore, it was also possible to obtaina result of no hit.

FIGS. 7C and 7D show results of analysis similar to FIGS. 7A and 7B withan improved software (Repertoire genesis). Similar results were alsoobtained with this software. Furthermore, it was also possible to obtaina result of no hit.

FIGS. 8A and 8B show results of analyzing a V region repertoire for eachsubtype. From the top, IgA1, IgA2, IgG1 and IgG2 are shown. Thehorizontal axis indicates each isotype name of each subclass. A BCR Vrepertoire is shown for each of IgA and IgG subclasses. The IgA subclasshad different frequencies in several types of V chains between IgA1 andIgA2. The frequency of presence of IGHV1-18 and IGHV4-39 was higher inIgA1 compared to that in IgA2, while the frequency of presence ofIGHV3-23 and IGHV3-74 was higher in IgA2 than that in IgA1. For the IgGsubclass, the frequency of IGHV3-23 and IGHV3-74, which were found to beincreased in IgA2, was higher in IgG2 compared to that in IgG1. Therewere few reads for IgG3 and IgG4 (10 reads). The frequency of cloneswith IGHV4-59-1GHJ4-IGHD1-7 was 3/10 in IgG3, thus having highclonality. Reads with IGHV3-23-IGHJ4-IGHD3-10 accounted for 5/10 forIgG4 (Table 1-3). FIGS. 8A and 8B show analysis with HighV-Quest ofIMGT.

FIGS. 8A and 8B show results of analyzing a V region repertoire for eachsubtype. From the top, IgA1, IgA2, IgG1 and IgG2 are shown. Thehorizontal axis indicates each isotype name of each subclass. A BCR Vrepertoire is shown for each of IgA and IgG subclasses. The IgA subclasshad different frequencies in several types of V chains between IgA1 andIgA2. The frequency of presence of IGHV1-18 and IGHV4-39 was higher inIgA1 compared to that in IgA2, while the frequency of presence ofIGHV3-23 and IGHV3-74 was higher in IgA2 than that in IgA1. For the IgGsubclass, the frequency of IGHV3-23 and IGHV3-74, which were found to beincreased in IgA2, was higher in IgG2 compared to that in IgG1. Therewere few reads for IgG3 and IgG4 (10 reads). The frequency of cloneswith IGHV4-59-1GHJ4-IGHD1-7 was 3/10 in IgG3, thus having highclonality. Reads with IGHV3-23-IGHJ4-IGHD3-10 accounted for 5/10 forIgG4 (Table 1-3). FIGS. 8A and 8B show analysis with HighV-Quest ofIMGT.

FIGS. 8C and 8D show results of analysis similar to FIGS. 8A and 8B withan improved software (Repertoire genesis). Similar results were alsoobtained with this software. Furthermore, it was also possible to obtaina result of no hit.

FIGS. 8C and 8D show results of analysis similar to FIGS. 8A and 8B withan improved software (Repertoire genesis). Similar results were alsoobtained with this software. Furthermore, it was also possible to obtaina result of no hit.

FIG. 9A shows results of analysis of a BCRJ repertoire for eachsubclass. A BCRJ repertoire for each subclass is shown. The top panelshows each of IgM, IgG, IgA, IgD and IgE. The horizontal axis indicateseach isotype name. The bottom panel is a display for each subclass. Fromthe left, IgA1, IgA2, IgG1 and IgG2 are shown. The horizontal axisindicates each isotype name of each subclass. IGHJ4 was used in abouthalf of the reads in IgM, IgG, IgA and IgD, while IGHJ2 was hardly used.Only IGHJ1 was used in IgE. An IGHJ repertoire in subclasses of IgM andIgA was also studied. FIG. 9A, where a significant difference amongsubclasses was not observed unlike an IGHV repertoire, shows analysiswith HighV-Quest of IMGT.

FIG. 9B shows results of analysis similar to FIG. 9A with an improvedsoftware. Similar results were also obtained with software (Repertoiregenesis) that is pending together with this patent application.Furthermore, it was also possible to obtain a result of no hit.

FIG. 10 shows a schematic diagram of an amplification method of a TCRgene. Explanation is provided for a primer pair exemplified in theExamples. Amplification was performed with a B-P20EA primer which is aP20EA adaptor primer added with an adaptor sequence, B-adaptor, and aprimer which is a 3^(rd) nested primer added with an A-adaptor and anidentification sequence, MID Tag sequence (denoted as MID, MID-1 to 26).Key indicates TCAG.

FIG. 11 shows results of electrophoresis of 10 μL of GS-PCR ampliconderived from 10 healthy individuals with 2% agarose gel. The top rowshows GS-PCR (TRA) and TCRα chain amplicon, and bottom row shows GS-PCR(TRB) and TCRβ, chain amplicon. The numbers indicate sample numbers.

FIG. 12 shows a parameter setting of a TCR/BCR repertoire analysissoftware (Repertoire genesis).

FIGS. 13 (A-D) show results of analysis of a TRAV repertoire in healthyindividuals. Each figure shows a TRAV repertoire for each sample (seenumbers). The horizontal axis indicates each TRAV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRAV9-2, 12and 13 were high. TRAV20 in #1 and TRAV21 in #5 were higher than otherhealthy individuals, exhibiting variations among individuals.

FIGS. 13 (A-D) show results of analysis of a TRAV repertoire in healthyindividuals. Each figure shows a TRAV repertoire for each sample (seenumbers). The horizontal axis indicates each TRAV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRAV9-2, 12and 13 were high. TRAV20 in #1 and TRAV21 in #5 were higher than otherhealthy individuals, exhibiting variations among individuals.

FIGS. 13 (A-D) show results of analysis of a TRAV repertoire in healthyindividuals. Each figure shows a TRAV repertoire for each sample (seenumbers). The horizontal axis indicates each TRAV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRAV9-2, 12and 13 were high. TRAV20 in #1 and TRAV21 in #5 were higher than otherhealthy individuals, exhibiting variations among individuals.

FIGS. 13 (A-D) show results of analysis of a TRAV repertoire in healthyindividuals. Each figure shows a TRAV repertoire for each sample (seenumbers). The horizontal axis indicates each TRAV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRAV9-2, 12and 13 were high. TRAV20 in #1 and TRAV21 in #5 were higher than otherhealthy individuals, exhibiting variations among individuals.

FIG. 14 (A-D) show results of analysis of a TRBV repertoire in healthyindividuals. Each figure shows a TRABV repertoire for each sample (seenumbers). The horizontal axis indicates each TRBV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRBV20-1, 28and 29-1 were high. TRBV3-1 in #8 was higher than other healthyindividuals, exhibiting variations among individuals.

FIG. 14 (A-D) show results of analysis of a TRBV repertoire in healthyindividuals. Each figure shows a TRABV repertoire for each sample (seenumbers). The horizontal axis indicates each TRBV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRBV20-1, 28and 29-1 are high. TRBV3-1 in #8 was higher than other healthyindividuals, exhibiting variations among individuals.

FIG. 14 (A-D) show results of analysis of a TRBV repertoire in healthyindividuals. Each figure shows a TRABV repertoire for each sample (seenumbers). The horizontal axis indicates each TRBV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRBV20-1, 28and 29-1 were high. TRBV3-1 in #8 was higher than other healthyindividuals, exhibiting variations among individuals.

FIG. 14 (A-D) show results of analysis of a TRBV repertoire in healthyindividuals. Each figure shows a TRABV repertoire for each sample (seenumbers). The horizontal axis indicates each TRBV gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBV repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRBV20-1, 28and 29-1 were high. TRBV3-1 in #8 was higher than other healthyindividuals, exhibiting variations among individuals.

FIG. 15 (A-D) shows results of analysis of a TRAJ repertoire in healthyindividuals. The horizontal axis indicates each TRAJ gene name and thevertical axis shows the frequency of presence thereof. Mean indicatesthe mean. A TRAJ repertoire for 10 healthy individuals and the meanvalue thereof are shown. A TRAJ repertoire of healthy individuals showedabout 5% or less in any AJ family. TRAJ12 in #1, TRAJ27 in #4, TRAJ37 in#5, and TRAJ45 in #8 were higher than other healthy individuals,exhibiting variations among individuals.

FIG. 15 (A-D) shows results of analysis of a TRAJ repertoire in healthyindividuals. The horizontal axis indicates each TRAJ gene name and thevertical axis shows the frequency of presence thereof. Mean indicatesthe mean. A TRAJ repertoire for 10 healthy individuals and the meanvalue thereof are shown. A TRAJ repertoire of healthy individuals showedabout 5% or less in any AJ family. TRAJ12 in #1, TRAJ27 in #4, TRAJ37 in#5, and TRAJ45 in #8 were higher than other healthy individuals,exhibiting variations among individuals.

FIG. 15 (A-D) shows results of analysis of a TRAJ repertoire in healthyindividuals. The horizontal axis indicates each TRAJ gene name and thevertical axis shows the frequency of presence thereof. Mean indicatesthe mean. A TRAJ repertoire for 10 healthy individuals and the meanvalue thereof are shown. A TRAJ repertoire of healthy individuals showedabout 5% or less in any AJ family. TRAJ12 in #1, TRAJ27 in #4, TRAJ37 in#5, and TRAJ45 in #8 were higher than other healthy individuals,exhibiting variations among individuals.

FIG. 15 (A-D) shows results of analysis of a TRAJ repertoire in healthyindividuals. The horizontal axis indicates each TRAJ gene name and thevertical axis shows the frequency of presence thereof. Mean indicatesthe mean. A TRAJ repertoire for 10 healthy individuals and the meanvalue thereof are shown. A TRAJ repertoire of healthy individuals showedabout 5% or less in any AJ family. TRAJ12 in #1, TRAJ27 in #4, TRAJ37 in#5, and TRAJ45 in #8 were higher than other healthy individuals,exhibiting variations among individuals.

FIG. 16 shows results of analysis of a TRBJ repertoire in healthyindividuals. The horizontal axis indicates each TRBJ gene name and thevertical axis indicates the frequency of presence thereof. Meanindicates the mean. A TRBJ repertoire for 10 healthy individuals and themean value thereof are shown. The frequency of presence of TRBJ2-1, 2-3,and 2-7 were high and TRBJ2-2 was high in #8 in TRBJ repertoires ofhealthy individuals, exhibiting variations among individuals.

FIG. 17 is a visualized result of electrophoresis of each 2^(nd) PCRamplicon synthesized in Preparation Example 3 with 2% agarose gel forverifying amplicons with a size of interest.

FIG. 18 shows an example of a possible primer setting region of TRAC onthe top row (target sequence is an artificially spliced functional TRACexon region sequence, consisting of exons EX1, EX2, and EX3; and aprimer can be set throughout the entire length). The bottom row shows apossible primer setting region of TRBC (target sequence is anartificially spliced functional TRBC exon region sequence, consisting ofexons EX1, EX2, EX3 and EX4; and a primer can be set throughout theentire length). It is understood that a TRAC sequence used as a targetsequence can be the illustrated sequence (SEQ ID NO: 1376) as well asmutants thereof. It is understood that a TRBC sequence used as a targetsequence can be the illustrated sequence (SEQ ID NO: 1377) as well asSEQ ID NOs: 1392, 1393 and other mutants thereof. FIGS. 18-25 areexplained. Each set sequence in the full-length sequence is merely anexemplification. A first TCR or BCR C region specific primer can be seton the most 5′ terminal side of a complementary DNA. Once a first TCR orBCR C region specific primer is set, a second TCR or BCR C regionspecific primer can be set downstream thereof. Furthermore, once asecond TCR or BCR C region specific primer is set, a third TCR or BCR Cregion specific primer can be set.

FIG. 19 shows an example of a possible primer setting region of TRGC onthe top row (target sequence is an artificially spliced functional TRGCexon region sequence, consisting of exons EX1, EX2, and EX3; and aprimer can be set throughout the entire length). The bottom row shows apossible primer setting region of TRDC (target sequence is anartificially spliced functional TRDC exon region sequence, consisting ofexons EX1, EX2, EX3 and EX4; and a primer can be set throughout theentire length). It is understood that a TRGC sequence used as a targetsequence can be the illustrated sequence (SEQ ID NO: 1378) as well asSEQ ID NOs: 1394, 1395, 1396, 1397, 1398, 1399 and mutants thereof. Itis understood that a TRDC sequence used as a target sequence can be theillustrated sequence (SEQ ID NO: 1379) as well as mutants thereof.

FIG. 20 shows an example of a possible primer setting region of IGHM(target sequence is an artificially spliced functional IGHM exon regionsequence, a secreted form consisting of exons CH1, CH2, CH3, CH4 andCH-S, and a membrane bound form consisting of CH1, CH2, CH3, CH4, M1 andM2. The figure shows an example of a membrane bound form. It isunderstood that an IGHM sequence used as a target sequence can be theillustrated sequence (SEQ ID NO: 1380) as well as SEQ ID NOs: 1447,1448, 1449, and mutants thereof. A primer can be set throughout theentire length).

FIG. 21 shows an example of a possible primer setting region of IGHA(target sequence is an artificially spliced functional IGHA exon regionsequence, secreted form consisting of exons CH1, H, CH2, CH3, and CH-S,and a membrane bound form consisting of CH1, H, CH2, CH3, M1 and M2. Thefigure shows an example of a secreted form. It is understood that anIGHA sequence used as a target sequence can be the illustrated sequence(SEQ ID NO: 1381) as well as SEQ ID NOs: 1400, 1401, 1402, 1403 andmutants thereof. A primer can be set throughout the entire length).

FIG. 22 shows an example of a possible primer setting region of IGHG(target sequence is an artificially spliced functional IGHG exon regionsequence, secreted form consisting of exons CH1, H, (H1, H2, H3, H4),CH2, CH3, and CH-S, and a membrane bound form consisting of CH1, H (H1,H2, H3, H4), CH2, CH3, M1 and M2. The figure shows an example of asecreted form. It is understood that an IGHG sequence used as a targetsequence can be the illustrated sequence (SEQ ID NO: 1382) as well asSEQ ID NOs: 1412-1446 and mutants thereof. A primer can be setthroughout the entire length).

FIG. 23 shows an example of a possible primer setting region of IGHD(target sequence is an artificially spliced functional IGHD exon regionsequence, secreted form consisting of exons CH1, H1, H2, CH2, CH3, andCH-S, and a membrane bound form consisting of CH1, H1, H2, CH2, CH3, M1and M2. The figure shows an example of a membrane bound form. It isunderstood that an IGHD sequence used as a target sequence can be theillustrated sequence (SEQ ID NO: 1383) as well as SEQ ID NOs: 1404-1406and mutants thereof. A primer can be set throughout the entire length).

FIG. 24 shows an example of a possible primer setting region of IGHE(target sequence is an artificially spliced functional IGHE exon regionsequence, secreted form consisting of exon CH1, exon CH2, exon CH3, andCH-S, and a membrane bound form consisting of CH1, exon CH2, exon CH3,M1 and M2. The figure shows an example of a secreted form. It isunderstood that an IGHE sequence used as a target sequence can be theillustrated sequence (SEQ ID NO: 1384) as well as SEQ ID NOs: 1407-1411and mutants thereof. A primer can be set throughout the entire length).

FIG. 25 shows an example of a possible primer setting region of IGKC onthe top row (target sequence is a functional IGKC CL sequence. It isunderstood that an IGKC sequence used as a target sequence can be theillustrated sequence (SEQ ID NO: 1379) as well as mutants thereof. Aprimer can be set throughout the entire length). The bottom row shows apossible primer setting region of IGLC (target sequence is a functionalIGLC CL sequence. It is understood that an IGLC sequence used as atarget sequence can be the illustrated sequence (SEQ ID NO: 1379) aswell as mutants thereof. A primer can be set throughout the entirelength).

FIG. 26 shows an image of RNA electrophoresis by an Agilent 2100bioanalyzer. Total RNA was extracted from a serially diluted cellsolution and the amount of RNA was measured with an Agilent bioanalyzer.A RNA was separated with a microchip electrophoretic apparatus to checkthe quality of the RNA. 28S (top band) and 18S rRNA (bottom band) weredetected in each sample, demonstrating that a RNA which has not beendegraded was obtained.

FIGS. 27 (A-D) show TCR reads in serially diluted Molt-4 cell samples(SEQ ID NOs: 1165-1324). TCR reads acquired from each of 10%, 1%, 0.1%,and 0.01% serially diluted Molt-4 sample are described. The reads wereranked in the order of having a greater number of reads and the top 40positions are shown. Ranking 365 to 404 are shown for the 0.01% sample.TRBV, TRBJ, and CDR3 amino acid sequences for each read, and number ofreads are shown. Functional TCR reads(TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG) (SEQ ID NO: 1163) derived fromMolt-4 are shown in bold with a gray background. The other TCR readsestimated to have a functional deficiency(TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR) (SEQ ID NO: 1164) are shown in bold.

FIGS. 27 (A-D) show TCR reads in serially diluted Molt-4 cell samples(SEQ ID NOs: 1165-1324). TCR reads acquired from each of 10%, 1%, 0.1%,and 0.01% serially diluted Molt-4 sample are described. The reads wereranked in the order of having a greater number of reads and the top 40positions are shown. Ranking 365 to 404 are shown for the 0.01% sample.TRBV, TRBJ, and CDR3 amino acid sequences for each read, and number ofreads are shown. Functional TCR reads(TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG) (SEQ ID NO: 1163) derived fromMolt-4 are shown in bold with a gray background. The other TCR readsestimated to have a functional deficiency(TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR) (SEQ ID NO: 1164) are shown in bold.

FIGS. 27 (A-D) show TCR reads in serially diluted Molt-4 cell samples(SEQ ID NOs: 1165-1324). TCR reads acquired from each of 10%, 1%, 0.1%,and 0.01% serially diluted Molt-4 sample are described. The reads wereranked in the order of having a greater number of reads and the top 40positions are shown. Ranking 365 to 404 are shown for the 0.01% sample.TRBV, TRBJ, and CDR3 amino acid sequences for each read, and number ofreads are shown. Functional TCR reads(TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG) (SEQ ID NO: 1163) derived fromMolt-4 are shown in bold with a gray background. The other TCR readsestimated to have a functional deficiency(TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR) (SEQ ID NO: 1164) are shown in bold.

FIGS. 27 (A-D) show TCR reads in serially diluted Molt-4 cell samples(SEQ ID NOs: 1165-1324). TCR reads acquired from each of 10%, 1%, 0.1%,and 0.01% serially diluted Molt-4 sample are described. The reads wereranked in the order of having a greater number of reads and the top 40positions are shown. Ranking 365 to 404 are shown for the 0.01% sample.TRBV, TRBJ, and CDR3 amino acid sequences for each read, and number ofreads are shown. Functional TCR reads(TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG) (SEQ ID NO: 1163) derived fromMolt-4 are shown in bold with a gray background. The other TCR readsestimated to have a functional deficiency(TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR) (SEQ ID NO: 1164) are shown in bold.

FIG. 28 shows detection sensitivity and the number of TCR reads in aserially diluted Molt-4 cell sample. Two TCR reads were detected from aMolt-4 cell (1: TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG (SEQ ID NO: 1163), 0:TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR) (SEQ ID NO: 1164). The figure showsthe percentage of TCR reads derived from Molt-4 detected in TCR readsacquired from each of 10%, 1%, 0.1% and 0.01% serially diluted Molt-4samples. The detection limit for each read was 0.1% (1) and 0.01% (o).

FIG. 29 is a schematic diagram showing the flow of TCR data analysis.

FIG. 30 is a schematic diagram showing the flow of BCR data analysis.

FIG. 31 is a diagram showing the frequency of C for each class. Thevertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. No-hit indicates a frequency of genes that donot fall under any gene.

FIGS. 32 (A and B) are diagrams showing a comparison of V repertoiresamong classes. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIGS. 32 (A and B) are diagrams showing a comparison of V repertoiresamong classes. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIG. 33 is a diagram showing a comparison of J repertoires amongclasses. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIGS. 34 (A and B) are diagrams showing a comparison of V repertoiresamong subclasses. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIGS. 34 (A and B) are diagrams showing a comparison of V repertoiresamong subclasses. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIG. 35 is a diagram showing a comparison of J repertoires amongsubclasses. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIGS. 36 (A and B) are diagrams showing a comparison of IgM Vrepertoires among specimens. The vertical axis indicates the frequency(%) and the horizontal axis indicates the gene names. No-hit indicates afrequency of genes that do not fall under any gene.

FIGS. 36 (A and B) are diagrams showing a comparison of IgM Vrepertoires among specimens. The vertical axis indicates the frequency(%) and the horizontal axis indicates the gene names. No hit indicates afrequency of genes that do not fall under any gene.

FIG. 37 is a diagram showing a comparison of IgM J repertoires amongspecimens. The vertical axis indicates the frequency (%) and thehorizontal axis indicates the gene names. No-hit indicates a frequencyof genes that do not fall under any gene.

FIG. 38 (A-D) show a comparison of TRAV repertoires among specimens. Thevertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIG. 38 (A-D) show a comparison of TRAV repertoires among specimens. Thevertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIG. 38 (A-D) show a comparison of TRAV repertoires among specimens. Thevertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIG. 38 (A-D) show a comparison of TRAV repertoires among specimens. Thevertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 39 (A-D) show a comparison of TRBV repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 39 (A-D) show a comparison of TRBV repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 39 (A-D) show a comparison of TRBV repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 39 (A-D) show a comparison of TRBV repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 40 (A-D) show a comparison of TRAJ repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 40 (A-D) show a comparison of TRAJ repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 40 (A-D) show a comparison of TRAJ repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIGS. 40 (A-D) show a comparison of TRAJ repertoires among specimens.The vertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIG. 41 shows a comparison of TRBJ repertoires among specimens. Thevertical axis indicates the frequency (%) and the horizontal axisindicates the gene names. “Mean” is the mean of all specimens and theerror bar indicates the ±standard deviation.

FIG. 42 shows a block diagram of the system of the present invention.

FIG. 43 shows a flow chart for the processing of the present invention.

FIG. 44 shows a distribution of the number of unique reads in TCRα andTCRβ, chain repertoire analysis. The distribution was examined forunique reads (base sequence without commonality with other reads) of allsequence reads, with the number of copies in the horizontal axis. A readthat was only detected once (single) was 73.3% (1250 reads) of the wholefor TCRα, and 70.5% (6502 reads) for a TCRβ chain.

FIG. 45 shows TRAV and TRAJ repertoires. The usage frequency of each ofTRAV and TRAJ in all reads is shown. The horizontal axis indicates TRAVgenes (top graph) and TRAJ genes (bottom graph). The vertical axisindicates the percentage (% Usage) accounted for among all reads.

FIG. 46 shows a 3D plot of a TRA repertoire. Usage frequency of eachcombination of TRAV and TRAJ in all reads is shown in athree-dimensional plot. The horizontal axis indicates a TRAJ gene, thedepth axis indicates a TRAV gene, and the vertical axis indicates usagefrequency (% Usage). The combination of TRAV10 and TRAJ15 exhibited thehighest usage frequency (12.53%).

FIG. 47 shows TRBV and TRBJ repertoires. The usage frequency of each ofTRBV and TRBJ in all reads is shown. The horizontal axis indicates TRBVgenes (top graph) and TRBJ genes (bottom graph). The vertical axisindicates the percentage (% Usage) accounted for among all reads.

FIG. 48 shows a 3D plot of a TRB repertoire. Usage frequency of eachcombination of TRBV and TRBJ in all reads is shown in athree-dimensional plot. The horizontal axis indicates a TRBV gene, thedepth axis indicates a TRBJ gene, and the vertical axis indicates usagefrequency (% Usage). The combination of TRBV29-1 and TRBJ2-7 exhibitedthe highest usage frequency (28.57%).

FIG. 49 is a schematic diagram of a method of estimating a TCRαβ, pairread (see Example 3 of analysis system).

FIG. 50 shows a schematic diagram of MiSeq Dual-indexed Paired-endSequencing in Example 4 of the analysis system.

FIG. 51 shows the use of TRAV and TRAJ in 20 healthy individuals. Thenumber of TCR sequences having each of TRAV and TRAJ was counted. Thefrequency percentages of 54 TRAV and 61 TRAJ were calculated and shownas scatter diagrams. Each dot indicates a frequency percentage of TRAVor TRAJ in each individual. The horizontal line indicates the mean valueof 20. (P): pseudogene, (ORF): Open Reading Frame.

FIG. 52 shows the use of TRBV and TRBJ in 20 healthy individuals. Thefrequency percentages of 65 TRBV and 14 TRBJ are shown as scatterdiagrams. Each dot indicates a frequency percentage of TRBV or TRBJ ineach individual. The red bar indicates the mean value. (P): pseudogene,(ORF): Open Reading Frame.

FIG. 53 shows the frequency of incidents of genetic recombination inTRAV at TRAJ in read data pooled from 20 healthy individuals. The numberof TCR sequence reads having genetic recombination was counted for eachof TRAV and TRAJ. The tendency of incidents of recombination isvisualized by displaying a heat map of the number of each recombination.The color of each pixel indicates the number of each recombination. ForTRAV, 8 psuedogenes (TRAV8-5, TRAV11, TRAV15, TRAV28, TRAV31, TRAV32,TRAV33 and TRAV37) and 1 ORF (TRAV8-7), and genes that were notsufficiently expressed (TRAV7, TRAV9-1, TRAV18 and TRAV36) wereexcluded. For TRAJ, 3 pseudogenes (TRAJ51, TRAJ55, and TRAJ60), 6 ORFs(TRAJ1, TRAJ2, TRAJ19, TRAJ25, TRAJ59, and TRAJ61), and genes that werenot sufficiently expressed (TRAJ14 and TRAJ46) were excluded. 2 ORFs(TRAJ35 and TRAJ48) found to have been expressed were included. Thedisplay shows a heat map of 2050 recombination events (41 TRAV×50 TRAJ).

FIG. 54 shows a 3D image of a TCRα repertoire. The number of TCRsequence reads having a predetermined genetic recombination of TRAV inTRAJ was counted. 3294 (54 TRAV×61 TRAJ) mean frequency percentages in20 healthy individuals are shown as a 3D bar graph. X axis and Y axisindicate TRAV and TRAJ, respectively. Recombination of TRAV1-2 in TRAJ33(AV1-2/AJ33) was the most expressed (0.99±0.85). (P): pseudogene, (ORF):Open Reading Frame.

FIG. 55 shows a 3D image of a TCR repertoire. The number of TCR sequencereads having a predetermined genetic recombination of TRBV in TRBJ wascounted. 910 (65 TRBV×14 TRBJ) mean frequency percentages in 20 healthyindividuals are shown as a 3D bar graph. X axis and Y axis indicate TRBVand TRBJ, respectively. (P): pseudogene, (ORF): Open Reading Frame.

FIG. 56 shows digital CDR3 chain length distributions for TCRα and TCRβ.The length of CDR3 was determined for 172109 TCRα and 94928 TCR sequencereads obtained from data pooled from 20 individuals. The length of anucleotide sequence from conserved cysteine at position 104 (Cys104)(naming by IMGT) to conserved phenylalanine at position 118 (Phe118) wasautomatically calculated by using an RG software. The distributions ofCDR3 chain lengths in TCRα (top) and TCRβ (bottom) are shown as ahistogram.

FIG. 57 shows diversity of TCRα and TCRβ repertoires in healthyindividuals. The number of copies (number of reads) of unique sequencereads (USR) was calculated. The mean number of copies per uniquesequence read in each individual is shown as a white circle (left). Theinverse Simpson index (middle) and Shannon-Weaver index (right) werecalculated by using an R program in accordance with the equationdescribed in the Materials and Methods section in Example 5 of theanalysis system. Each white circle indicates an index for an individual.There was no significant difference in the mean number of copies,inverse Simpson index or Shannon-Weaver index between TCRα and TCRβ.

FIG. 58 shows similarity of TCRα and TCRβ repertoires in healthyindividuals. The frequency of incidence of TCR sequence reads sharedbetween all pairs of individuals was calculated (Table 4-6 and Table4-7). The mean frequency percentages of shared leads were comparedbetween TCRα and TCRβ (left, n=380). A Morisita-Horn index, which is asimilar index, was calculated by using an R program in accordance withthe equation described in the Materials and Methods section in Example 5of the analysis system. There was no significant difference insimilarity indices between TCRα and TCRβ and frequency of shared reads(p<0.001 and p<0.001, respectively; Mann-Whitney U test).

FIG. 59 shows that a public TCR had CDR3 with a shorter chain lengththan a private TCR. The length of CDR3 was calculated with 7237 USRs(gray) of a public TCE and 83997 USRs (black) of a private TCR. Thefrequency percentages of USR in each CDR3 length were plotted as a bargraph. The median values of CDR3 lengths in public and private TCRs were39 and 42, respectively.

FIG. 60 shows the correlation of gene use of TRAV, TRAJ, TRBV, and TRBJamong healthy individuals. The frequency percentages of TRAV (top left),TRAJ (top right), TRBV (bottom left) and TRBJ (bottom right) between allpairs of individuals are plotted. A dot that is offset below thediagonal line (y=x) indicates better correlation.

FIG. 61 shows matching correlation coefficients in TRAV, TRAJ, TRBV, andTRBJ. A correlation coefficient between two samples derived from healthyindividuals was calculated by Spearman's correlation test. Each dotindicates a correlation coefficient value between pairs of individuals.The mean correlation coefficient is indicated by a horizontal line(n=190).

FIG. 62 shows a summary of cancer idiotype peptide sensitization immunecell therapy. Lymphocytes were collected from the patient on the topleft and repertoire analysis was conducted for TCRs or BCRs to predictan HLA-binding peptide. The predicted HLA binding peptide is then usedfor tailor-made peptide sensitization CTL therapeutic method ortailor-made peptide sensitization DC vaccine therapeutic method.Particularly in an antibody therapeutic method targeting a tumor cell,it would be an issue when a target antigen is not expressed in tumorcells or the target antigen is also expressed in normal cells. Incomparison, a sequence specific to a tumor cell is selected and utilizedherein. Thus, therapy with higher specificity and fewer side effects isexpected.

FIG. 63 shows a summary of an improved CTL method. In an existing LAKtherapeutic method (top right) or CTL therapeutic method (bottom right),lymphocytes separated from the peripheral blood of a patient areactivated by an anti-CD3 antibody and IL-2. On the other hand, animproved CTL therapeutic method (left) separates dendritic cells andCD8⁺ T cells from the peripheral blood of a patient and uses an antigenpeptide for coculture stimulation. Unlike existing activation of a widerange of T cells by an anti-CD3 antibody or IL-2, therapy with higherlevel of specificity and fewer side effects can be expected by impartingantigen specificity into CD8⁺ T cells utilizing an antigen peptide.Further, this is characterized in that a high level of therapeuticeffect can be expected because an individualized peptide created basedon information obtained from a tumor cell of a patient is utilized.

FIG. 64 shows a summary of a DC vaccine therapeutic method. A dendriticcell is separated from the patient on the left and is mixed and culturedwith an antigen peptide. In a DC vaccine therapeutic method, anindividualized peptide is created based on sequence information obtainedfrom a tumor cell derived from a patient. Thus, the therapeutic methoddoes not act on normal cells while act on tumor cells more specifically,such that a high therapeutic effect can be expected. Since a peptide isused as an antigen, unlike proteins, there is an advantage of being ableto readily chemically synthesize.

FIG. 65 shows a summary of a patient autoimmune cell therapeutic method.An improved CTL therapeutic method (left) separates dendritic cells andCD8⁺ T cells from the peripheral blood of a patient and uses an antigenpeptide for coculture stimulation. Both cytotoxic T cells and antigenpresenting cells are introduced into the patient. Thus, this ischaracterized in having expectation of a synergistic effect which isbetween an acute effect due to CTL imparting specificity and a sustainedeffect due to dendritic cells utilized as an antigen-presenting cell.

FIG. 66 shows a summary for the isolation of a tailor-made cancerspecific T cell receptor gene and isolation of a cancer specific TCRgene by an in vitro antigen stimulation. As shown, a tumor specific TCRgene is obtained by coculturing T cells derived from a patient,inactivated cancer derived from a patient and an antigen peptide. Oncegenetic information is obtained, a cancer specific TCR gene that isisolated by an in vitro antigen stimulation can be prepared by using anywell-known technology in the art. Such an isolated tailor-made cancerspecific T cell receptor gene and cancer specific TCR gene can be usedfor therapy and prevention of various cancers.

FIG. 67 shows a summary for preparation of an isolated cancer specificTCR gene by an in vitro antigen stimulation. As shown, obtained TCRα andTCRβ genes are introduced into a TCR expressing viral vector (middle) toinfect a T lymphocyte from a patient for transformation.

FIG. 68 shows a summary of a cell processing therapeutic method. Asshown, a tumor specific TCR gene obtained by TCR repertoire analysisfrom T lymphocytes isolated from the patient on the top right isintroduced into a T lymphocyte derived from a patient to introduce atumor specific T lymphocyte into the patient. Optimal TCR candidates canbe artificially transgenically introduced into a lymphocyte of thepatient to select a TCR exhibiting the highest reactivity to actualcancer tissue of the patient as the optimal TCR.

FIG. 69 shows a summary for a method of performing an in vitrostimulation test to assess efficacy and/or safety. The efficacy and/orsafety of a T lymphocyte introduced with a tumor specific TCR isassessed by an in vitro stimulation test (arrow pointing down). A Tlymphocyte suitable for therapy is selected based on such assessment invitro (arrow pointing up). Efficacy is assessed by coculturing alymphocyte introduced with a tumor specific TCR and a cancer cellderived from a patient and testing the reactivity. When safety isassessed, the same test is performed by using normal cells instead ofcancer cells.

DESCRIPTION OF EMBODIMENTS

The present invention is described hereinafter. Throughout the entirespecification, a singular expression should be understood asencompassing the concept thereof in the plural form unless specificallynoted otherwise. Thus, singular articles (e.g., “a”, “an”, “the” and thelike in case of English) should also be understood as encompassing theconcept thereof in the plural form unless specifically noted otherwise.Further, the terms used herein should be understood as being used in themeaning that is commonly used in the art, unless specifically notedotherwise. Thus, unless defined otherwise, all terminologies andscientific technical terms that are used herein have the same meaning asthe terms commonly understood by those skilled in the art pertaining tothe present invention. In case of a contradiction, the presentspecification (including the definitions) takes precedence.

As used herein, “database” refers to any database related to genes andespecially to a database comprising T cell receptor and B cell receptorrepertoires in the present invention. Examples of such a databaseinclude, but are not limited to, IMGT (the international ImMunoGeneTicsinformation system, www dot imgt dot org) database, DNA Data Bank ofJapan (DDBJ, DNA Data Bank of Japan, www dot ddbj dot nig dot ac dot jp)database, GenBank (National Center for Biotechnology Information, wwwdot ncbi dot nlm dot nih dot gov/genbank/) database, ENA (EMBL (EuropeanMolecular Biology Laboratory), www dot ebi dot ac dot uk/ena) and thelike.

As used herein, “genetic sequence analysis” or “gene sequencing” refersto analysis of a constituent nucleic acid sequence and/or amino acidsequence of a gene. “Genetic sequence analysis” or “gene sequencing”includes any analysis associated with a gene such as determination of abase or residue, determination of homology, determination of a domain,or determination of a latent function.

As used herein, “T cell receptor (TCR)” refers to a T cell receptor or aT cell antigen receptor, or a receptor expressed on a cell membrane of aT cell that regulates an immune system, and recognizes an antigen. Thereare α chain, β chain, γ chain and δ chain, constituting an αβ or γδdimer. A TCR consisting of the former combination is called an αβ TCRand a TCR consisting of the latter combination is called a γδ TCR. Tcells having such TCRs are called αβ T cell or γδ T cell. The structureis very similar to a Fab fragment of an antibody produced by a B cell,and recognizes an antigen molecule bound to an MHC molecule. Since a TCRgene of a mature T cell has undergone gene rearrangement, an individualhas a diverse TCR and is able to recognize various antigens. A TCRfurther binds to an invariable CD3 molecule present in a cell membraneto form a complex. CD3 has an amino acid sequence called the ITAM(immunoreceptor tyrosine-based activation motif) in an intracellularregion. This motif is considered to be involved in intracellularsignaling. Each TCR chain is composed of a variable section (V) and aconstant section (C). The constant section penetrates through the cellmembrane and has a short cytoplasm portion. The variable section ispresent extracellularly and binds to an antigen-MHC complex. Thevariable section has three regions called a hypervariable section or acomplementarity determining region (CDR), which binds to an antigen-MHCcomplex. The three CDRs are each called CDR1, CDR2, and CDR3. For a TCR,CDR1 and CDR2 are considered to bind to an MHC, while CDR3 is consideredto bind to an antigen. Gene rearrangement of a TCR is similar to theprocess for a B cell receptor known as an immunoglobulin. In generearrangement of an αβ TCR, VDJ rearrangement of a β chain is firstperformed and then VJ rearrangement of an α chain is performed. Since agene of a δ chain is deleted from a chromosome in rearrangement of an αchain, a T cell having an α3 TCR would not simultaneously have a γδ TCR.In contrast, in a T cell having a γδ TCR, a signal mediated by this TCRsuppresses expression of a 13, chain. Thus, a T cell having a γδ TCRwould not simultaneously have an αβ TCR.

As used herein, “B cell receptor (BCR)” is also called a B cell receptoror B cell antigen receptor and refers to those composed of an Igα/Igβ(CD79a/CD79b) heterodimer (α/β) conjugated with a membrane-boundimmunoglobulin (mIg). An mIg subunit binds to an antigen to induceaggregation of the receptors, while an α/β subunit transmits a signal tothe inside of a cell. BCRs, when aggregated, are understood to quicklyactivate Lyn, Blk, and Fyn of Src family kinases as in Syk and Btk oftyrosine kinases. Results greatly differ depending on the complexity ofBCR signaling, the results including survival, resistance (allergy; lackof hypersensitivity reaction to antigen) or apoptosis, cell division,differentiation into antibody-producing cell or memory B cell and thelike. Several hundred million types of T cells with a different TCRvariable region sequence are produced and several hundred million typesof B cells with a different BCR (or antibody) variable region sequenceare produced. Individual sequences of TCRs and BCRs vary due to anintroduced mutation or rearrangement of the genomic sequence. Thus, itis possible to obtain a clue for antigen specificity of a T cell or a Bcell by determining a genomic sequence of TCR/BCR or a sequence of anmRNA (cDNA).

As used herein, “V region” refers to a variable section (V) of avariable region of a TCR chain or a BCR chain.

As used herein, “D region” refers to a D region of a variable region ofa TCR chain or a BCR chain.

As used herein, “J region” refers to a J region of a variable region ofa TCR chain or a BCR chain.

As used herein, “C region” refers to a constant section (C) region of aTCR chain or a BCR chain.

As used herein, “repertoire of a variable region” refers to a collectionof V(D)J regions created in any manner by gene rearrangement in a TCR orBCR. The terms such as TCR repertoire and BCR repertoire are used, whichare also called, for example, T cell repertoire, B cell repertoire orthe like in some cases. For instance, “T cell repertoire” refers to acollection of lymphocytes characterized by expression of a T cellreceptor (TCR) serving an important role in antigen recognition. Achange in a T cell repertoire provides a significant indicator of animmune status in a physiological condition and disease condition. Thus,a T cell repertoire has been analyzed to identify an antigen specific Tcell involved in the pathology of a disease and diagnosis of abnormalityin T lymphocytes. Comparison of the variable region usage byfluorescence activated cell sorter analysis which uses a larger panel ofa TCR variable region specific antibody (van den Beemd R et al. (2000)Cytometry 40: 336-345; MacIsaac C et al. (2003) J Immunol Methods 283:9-15; Tembhare P et al. (2011) Am J Clin Pathol 135: 890-900; Langerak AW et al. (2001) Blood 98: 165-173), by polymerase chain reaction (PCR)using multiple primers (Rebai N et al. (1994) Proc Natl Acad Sci USA 91:1529-1533), or by enzyme-linked immunosorbent assay based on PCR(Matsutani T et al. (1997) Hum Immunol 56: 57-69; Matsutani T et al.(2000) Br J Haematol 109: 759-769) have been extensively used to detecta change in a T cell repertoire. Analysis of a chain length distributionknown as CDR3 spectratyping is based on the addition of a nontemplatenucleotide in a V-(D)-J region and has been used to assess clonality anddiversity of T cells (Matsutani T et al. (2007) Mol Immunol 44:2378-2387; Matsutani T et al. (2011) Mol Immunol 48: 623-629). Tofurther identify the antigen specificity of a T cell, PCR cloning of aTCR clone type and subsequence sequencing of an antigen recognitionregion and CDR3 were required. Such conventional approach is commonlyused. However, this is a time and labor intensive method for researchinga TCR repertoire.

As used herein, “quantitative analysis” refers to analysis that isquantitative in nature. In the present invention, “quantitativeanalysis” refers to analysis in a form reflecting the amount of eachclone that was originally present in repertoire analysis.

As used herein, “sample” includes, but is not limited to, componentsderived from a subject (body fluid such as blood or the like).

As used herein, “complementary DNA” refers to a DNA forming acomplementary strand with respect to a target nucleic acid molecule,e.g., RNA included in an RNA sample or the like derived from a targetcell.

As used herein, “double stranded complementary DNA” refers to DNAs thatare complementary to each other and form a double strand. In the presentinvention, this can be produced, for example, with a complementary DNAforming a complementary strand with respect to an RNA included in an

RNA sample or the like derived from a target cell as a template.

As used herein, “common adaptor primer sequence” refers to a sequence ofa portion added in common to all sequences in an adaptor-added doublestranded complementary DNA used as a primer in the first PCRamplification reaction and the second PCR amplification reaction of thepresent invention.

As used herein, “adaptor-added double stranded complementary DNA” refersto a DNA used as a primer in the first PCR in the present invention,wherein a common adaptor primer sequence is added to various doublestranded complementary DNAs in a sample. This is used as a template inthe first primer amplification reaction.

As used herein, “common adaptor primer” refers to a DNA used as a primerin the first PCR reaction and the second PCR amplification reaction ofthe present invention, wherein a single common sequence is used in eachreaction.

As used herein, “first TCR or BCR C region specific primer” refers to aprimer used in the first PCR amplification reaction of the presentinvention, comprising a sequence specific to a C region of a TCR or aBCR.

FIG. 18 shows an example of a possible primer setting region of TRAC onthe top row (target sequence is an artificially spliced functional TRACexon region sequence, consisting of exons EX1, EX2, and EX3; and aprimer can be set throughout the entire length). The bottom row shows anexample of a possible primer setting region of TRBC (target sequence isan artificially spliced functional TRBC exon region sequence, consistingof exons EX1, EX2, EX3 and EX4; and a primer can be set throughout theentire length). In addition, a first TCR or BCR C region specific primercan be set on the most 5′ terminal side of a complementary DNA. Once afirst TCR or BCR C region specific primer is set, a second TCR or BCR Cregion specific primer can be set downstream thereof. Furthermore, oncethe second TCR or BCR C region specific primer is set, a third TCR orBCR C region specific primer can be set. That is, when the first is set,the second is downstream thereof, and the third is further downstream.Theoretically, it is understood that the primers only need to bedownstream by the length of the primer.

FIG. 19 shows an example of a possible primer setting region of TRGC onthe top row (target sequence is an artificially spliced functional TRGCexon region sequence, consisting of exons EX1, EX2, and EX3; and aprimer can be set throughout the entire length). The bottom row shows anexample of a possible primer setting region of TRDC (target sequence isan artificially spliced functional TRDC exon region sequence, consistingof exons EX1, EX2, and EX3; and a primer can be set throughout theentire length). In addition, a first TCR or BCR C region specific primercan be set on the most 5′ terminal side of a complementary DNA. Once afirst TCR or BCR C region specific primer is set, a second TCR or BCR Cregion specific primer can be set downstream thereof. Furthermore, oncethe second TCR or BCR C region specific primer is set, a third TCR orBCR C region specific primer can be set. That is, when the first is set,the second is downstream thereof, and the third is further downstream.Theoretically, it is understood that the primers only need to bedownstream by the length of the primer.

FIG. 20 shows an example of a possible primer setting region of IGHM(target sequence is an artificially spliced functional IGHM exon regionsequence, consisting of exons CH1, CH2, CH3, and CH4; and a primer canbe set throughout the entire length). In addition, a first TCR or BCR Cregion specific primer can be set on the most 5′ terminal side of acomplementary DNA. Once a first TCR or BCR C region specific primer isset, a second TCR or BCR C region specific primer can be set downstreamthereof. Furthermore, once the second TCR or BCR C region specificprimer is set, a third TCR or BCR C region specific primer can be set.That is, when the first is set, the second is downstream thereof, andthe third is further downstream. Theoretically, it is understood thatthe primers only need to be downstream by the length of the primer.

FIG. 21 shows an example of a possible primer setting region of IGHA(target sequence is an artificially spliced functional IGHA exon regionsequence. A secreted form consists of exons CH1, H, CH2, CH3, and CH-S,and a membrane bound form consists of CH1, H, CH2, CH3, M1 and M2. Aprimer can be set throughout the entire length). In addition, a firstTCR or BCR C region specific primer can be set on the most 5′ terminalside of a complementary DNA. Once a first TCR or BCR C region specificprimer is set, a second TCR or BCR C region specific primer can be setdownstream thereof. Furthermore, once the second TCR or BCR C regionspecific primer is set, a third TCR or BCR C region specific primer canbe set. That is, when the first is set, the second is downstreamthereof, and the third is further downstream. Theoretically, it isunderstood that the primers only need to be downstream by the length ofthe primer.

FIG. 22 shows an example of a possible primer setting region of IGHG(target sequence is an artificially spliced functional IGHG exon regionsequence. A secreted form consists of exons CH1, H (H1, H2, H3, H4),CH2, CH3, and CH-S, and a membrane bound form consists of CH1, H (H1,H2, H3, H4), CH2, CH3, M1 and M2. A primer can be set throughout theentire length). In addition, a first TCR or BCR C region specific primercan be set on the most 5′ terminal side of a complementary DNA. Once afirst TCR or BCR C region specific primer is set, a second TCR or BCR Cregion specific primer can be set downstream thereof. Furthermore, oncethe second TCR or BCR C region specific primer is set, a third TCR orBCR C region specific primer can be set. That is, when the first is set,the second is downstream thereof, and the third is further downstream.Theoretically, it is understood that the primers only need to bedownstream by the length of the primer.

FIG. 23 shows an example of a possible primer setting region of IGHD(target sequence is an artificially spliced functional IGHD exon regionsequence. A secreted form consists of exons CH1, H1, H2, CH2, CH3, andCH-S, and a membrane bound form consists of CH1, H1, H2, CH2, CH3, M1and M2. A primer can be set throughout the entire length). In addition,a first TCR or BCR C region specific primer can be set on the most 5′terminal side of a complementary DNA. Once a first TCR or BCR C regionspecific primer is set, a second TCR or BCR C region specific primer canbe set downstream thereof. Furthermore, once the second TCR or BCR Cregion specific primer is set, a third TCR or BCR C region specificprimer can be set. That is, when the first is set, the second isdownstream thereof, and the third is further downstream. Theoretically,it is understood that the primers only need to be downstream by thelength of the primer.

FIG. 24 shows an example of a possible primer setting region of IGHE(target sequence is an artificially spliced functional IGHE exon regionsequence. A secreted form consists of exon CH1, exon CH2, exon CH3, andCH-S, and a membrane bound form consists of CH1, exon CH2, exon CH3, M1and M2. A primer can be set throughout the entire length). In addition,a first TCR or BCR C region specific primer can be set on the most 5′terminal side of a complementary DNA. Once a first TCR or BCR C regionspecific primer is set, a second TCR or BCR C region specific primer canbe set downstream thereof. Furthermore, once the second TCR or BCR Cregion specific primer is set, a third TCR or BCR C region specificprimer can be set. That is, when the first is set, the second isdownstream thereof, and the third is further downstream. Theoretically,it is understood that the primers only need to be downstream by thelength of the primer.

FIG. 25 shows an example of a possible primer setting region of IGKC onthe top row (target sequence is a functional IGKC CL sequence. A primercan be set throughout the entire length). The bottom row shows anexample of a possible primer setting region of IGLC (target sequence isa functional IGLC CL sequence A primer can be set throughout the entirelength). In addition, a first TCR or BCR C region specific primer can beset on the most 5′ terminal side of a complementary DNA. Once a firstTCR or BCR C region specific primer is set, a second TCR or BCR C regionspecific primer can be set downstream thereof. Furthermore, once thesecond TCR or BCR C region specific primer is set, a third TCR or BCR Cregion specific primer can be set. That is, when the first is set, thesecond is downstream thereof, and the third is further downstream.Theoretically, it is understood that the primers only need to bedownstream by the length of the primer.

Specifically, a first TCR or BCR C region specific primer has thefollowing structure: for BCRs, CM1 (SEQ ID NO: 5), CA1 (SEQ ID NO: 8),CG1 (SEQ ID NO: 11), CD1 (SEQ ID NO: 14), or CE1 (SEQ ID NO: 17), andfor TCRs, CA1 (SEQ ID NO: 35) or CB1 (SEQ ID NO: 37) or the like.However, the structure is not limited thereto. Such a primer sequencecan be set in, but is not limited to, the following specific ranges. Thefirst, second and third ranges can be set in the entire range, but canbe mutually determined.

α sequence of TCR: base number 213 to base number 235 of SEQ ID NO: 1376(FIG. 18)

β sequence of TCR: base number 278 to base number 300 of SEQ ID NO: 1377(FIG. 18)

γ sequence of TCR: base number 184 to base number 201 of SEQ ID NO: 1378(FIG. 19)

δ sequence of TCR: base number 231 to base number 249 of SEQ ID NO: 1379(FIG. 19)

IgM heavy chain sequence of BCR: base number 77 to base number 95 of SEQID NO: 1380 (FIG. 20)

IgA heavy chain sequence of BCR: base number 189 to base number 208 ofSEQ ID NO: 1381 (FIG. 21)

IgG heavy chain sequence of BCR: base number 262 to base number 282 ofSEQ ID NO: 1382 (FIG. 22)

IgD heavy chain sequence of BCR: base number 164 to base number 183 ofSEQ ID NO: 1383 (FIG. 23)

IgE heavy chain sequence of BCR: base number 182 to base number 199 ofSEQ ID NO: 1384 (FIG. 24)

Igκ chain constant region sequence of BCR: base number 230 to basenumber 248 of SEQ ID NO: 1385 (FIG. 25)

Igλ chain sequence of BCR: base number 273 to base number 291 of SEQ IDNO: 1386 (FIG. 25)

As used herein, “specific” refers to binding to a target sequence, butbinding poorly and preferably not binding to other sequences at least ina pool of target TCRs or BCRs and preferably in all sequences of TCRs orBCRs that are present. A specific sequence would be advantageously andpreferably, but not necessarily limited to, fully complementary to atarget sequence.

As used herein, “sufficiently specific (to a C region of interest)”refers to having sufficient specificity for a gene amplificationreaction. The same sequence as a target C region would be advantageousand preferable, but it is not necessarily limited thereto.

As used herein, “first PCR amplification reaction” is a PCRamplification reaction performed in the first stage of the method ofpreparing a sample of the present invention.

As used herein, “not homologous with other genetic sequences” refers tohaving low homology to such an extent that a gene amplification reactiondoes not occur with sequences other than the sequence of interest (e.g.,C region of interest of a TCR or BCR).

As used herein, “comprising a mismatching base (between subtypes)downstream” refers to comprising a mismatching base between subtypes,downstream of a sequence that is used when set as a primer. An ampliconwould have a different sequence for each subtype by setting such asequence. Thus, a subtype can be identified by determining a sequence.

As used herein, “second TCR or BCR C region specific primer” refers to aprimer used in the second PCR amplification reaction of the presentinvention, comprising a sequence specific to a C region of a TCR or BCR.A second TCR or BCR C region specific primer is designed to have asequence that is a complete match with the TCR or BCR C region in asequence downstream the sequence of the first TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified. Examples of such a sequence include,but are not limited to, CM2 (SEQ ID NO: 6), CA2 (SEQ ID NO: 9), CG2 (SEQID NO: 12), CD2 (SEQ ID NO: 15), and CE2 (SEQ ID NO: 18) for BCRs, andCA2 (SEQ ID NO: 36) and CB2 (SEQ ID NO: 38) for TCRs and the like. Sucha primer sequence can be set in, but is not limited to, the followingspecific ranges. The first, second and third ranges can be set in theentire range, but can be mutually determined. That is, when the first isset, the second is downstream thereof, and the third is furtherdownstream. Theoretically, it is understood that the primers only needto be downstream by the length of the primer.

α sequence of TCR: base number 146 to base number 168 of SEQ ID NO: 1376(FIG. 18), β sequence of TCR: base number 205 to base number 227 of SEQID NO: 1377 (FIG. 18), γ sequence of TCR: base number 141 to base number160 of SEQ ID NO: 1378 (FIG. 19), δ sequence of TCR: base number 135 tobase number 155 of SEQ ID NO: 1379 (FIG. 19), IgM heavy chain sequenceof BCR: base number 43 to base number 62 of SEQ ID NO: 1380 (FIG. 20),IgA heavy chain sequence of BCR: base number 141 to base number 161 ofSEQ ID NO: 1381 (FIG. 21), IgG heavy chain sequence of BCR: base number163 to base number 183 of SEQ ID NO: 1382 (FIG. 22), IgD heavy chainsequence of BCR: base number 125 to base number 142 of SEQ ID NO: 1383(FIG. 23), IgE heavy chain sequence of BCR: base number 155 to basenumber 173 of SEQ ID NO: 1384 (FIG. 24), Igκ chain constant regionsequence of BCR: base number 103 to base number 120 of SEQ ID NO: 1385(FIG. 25), Igλ chain sequence of BCR: base number 85 to base number 100of SEQ ID NO: 1386 (FIG. 25).

As used herein, “second PCR amplification reaction” refers to a PCRamplification reaction performed in a nested form after the first PCRreaction by using a product of the first PCR reaction as a template inthe sample production for analysis of the present invention. In thepresent invention, the amplification reaction is performed by using acommon adaptor primer and a second TCR or BCR C region specific primer.In this regard, the second TCR or BCR C region specific primer isdesigned to have a sequence that is a complete match with the TCR or BCRC region in a sequence downstream the sequence of the first TCR or BCR Cregion specific primer, but comprise a sequence that is not homologouswith other genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified.

As used herein “third PCR amplification reaction” is a PCR amplificationreaction performed after a second nested PCR reaction by using a productof the second nested PCR reaction as a template in the sample productionfor analysis of the present invention, where a product thereof is usedin the sample production for analysis of the present invention. A thirdPCR amplification reaction is performed, after the second nested PCR byusing a product of a second nested PCR reaction as a template, by usingan added common adaptor primer in which a nucleic acid sequence of thecommon adaptor primer comprises a first additional adaptor nucleic acidsequence, and an adaptor-added third TCR or BCR C region specific primerin which a second additional adaptor nucleic acid sequence and amolecule identification sequence (MID Tag sequence) are added to a thirdTCR or BCR C region specific sequence. An adaptor-added third TCR or BCRC region specific primer may comprise a sequence for verifying a nucleicacid sequence position called a key sequence. Specific example of theadded common adaptor primer that can be used includes Adaptor B (SEQ IDNO: 1375)-TAATACGACTCCGAATTCCC, and specific examples of anadaptor-added third TCR or BCR C region specific primer that are usedinclude Adaptor A (SEQ ID NO: 39)-key(TCAG)-MID1-(SEQ ID NO:40)AAAGGGTTGGGGCGGATGC (SEQ ID NO: 1387) (entire primer is SEQ ID NO:7), Adaptor A (SEQ ID NO: 39)-key(TCAG)-MID2 (SEQ ID NO:41)-CCGCTTTCGCTCCAGGTCAC (SEQ ID NO: 1388) (entire primer is SEQ ID NO:10), Adaptor A (SEQ ID NO: 39)-key(TCAG)-MID3 (SEQ ID NO:42)-TGAGTTCCACGACACCGTCAC (SEQ ID NO: 1389) (entire primer is SEQ ID NO:13), Adaptor A (SEQ ID NO: 39)-key(TCAG)-MID4 (SEQ ID NO:43)-CCCAGTTATCAAGCATGCC (SEQ ID NO: 1390) (entire sequence is SEQ ID NO:16), Adaptor A (SEQ ID NO: 39)-key(TCAG)-MID5 (SEQ ID NO:44)-CATTGGAGGGAATGTTTTTG (SEQ ID NO: 1391) (entire primer is SEQ ID NO:19) and the like.

As used herein, “first additional adaptor nucleic acid sequence” is asequence added to a primer used in the third PCR amplification reactionof the present invention, which is added to a nucleic acid sequence of acommon adaptor primer for use. A first additional adaptor nucleic acidsequence may be different from or the same as a second additionaladaptor nucleic acid sequence. As for the characteristic of such asequence, such a nucleic acid sequence is a sequence suitable forbonding to a DNA capturing bead and an emPCR reaction (for example, seeChee-Seng, Ku; En Yun, Loy; Yudi, Pawitan; and Kee-Seng, Chia. NextGeneration Sequencing Technologies and Their Applications. In:Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester.April 2010; Metzker M L. Sequencing technologies—the next generation.Nat Rev Genet. 2010 January; 11(1): 31-46). Any sequence may be used aslong as the sequence has such a characteristic. Specifically,CCTATCCCCTGTGTGCCTTGGCAGTC (SEQ ID NO: 1375) is used, but the sequenceis not limited thereto.

As used herein, “second addition adaptor nucleic acid sequence” is asequence added to a primer used in the third PCR amplification reactionof the present invention, wherein the sequence is optionally used with amolecule identifying sequence (e.g., (MID Tag sequence)) and/or a keysequence and is added to a third TCR or BCR C region specific sequenceto constitute an adaptor-added third TCR or BCR C region specificprimer. A second additional adaptor nucleic acid sequence may bedifferent from or the same as a second additional adaptor nucleic acidsequence. As for the characteristic of such a sequence, such a nucleicacid sequence is a sequence suitable for an emPCR reaction, and anysequence may be used as long as the sequence has such a characteristic.Specifically, CCATCTCATCCCTGCGTGTCTCCGAC (SEQ ID NO: 39) is used, butthe sequence is not limited thereto.

A “key sequence” as used herein is a sequence added to a primer used inthe third PCR amplification reaction of the present invention, whereinthe sequence is optionally used with a molecule identifying sequence(e.g., (MID Tag sequence)) and is added to a third TCR or BCR C regionspecific sequence to constitute an adaptor-added third TCR or BCR Cregion specific primer. Such a key sequence may be any sequence as longas a nucleic acid sequence position can be verified. A key sequence with4 bases (TCAG) is used, but a key sequence is not limited thereto.

As used herein, “molecule identifying (MID Tag) sequence” is a sequencefor imparting uniqueness such that an amplicon can be identified. Thus,it is preferably different from a sequence of interest. Further, it ispreferably a sequence that does not affect amplification. Examples ofsuch a sequence include, but are not limited to, sequences of SEQ IDNOs: 1325-1374. The baseline of determination for an identificationsequence (tag sequence) and representative examples thereof are thefollowing. Specifically, the baseline of determination of a tag sequenceis explained as follows. A tag sequence is a base sequence added todistinguish each sample when a plurality of samples are mixed andsimultaneously sequenced. A read from one sample corresponds to one tagsequence. Thus, it is possible to identify which sample is derived froman acquired read sequence. A tag sequence is any sequence of 4 types ofbases A, C, G, and T. Theoretically, about a million types of sequencescan be created with 10 bases and about a trillion types of sequenceswith 20 bases. The length of a base sequence is preferably between 2 and40 bases, and more preferably between 6 and 10 bases. At the same time,it is desirable to use a sequence that is free of a consecutive sequence(AA, CC, GG, or TT). Representative tags that can be used herein are,but not limited to, the following: ACGAGTGCGT (SEQ ID NO: 1325),ACGCTCGACA (SEQ ID NO: 1326), AGACGCACTC (SEQ ID NO: 1327), AGCACTGTAG(SEQ ID NO: 1328), ATCAGACACG (SEQ ID NO: 1329), ATATCGCGAG (SEQ ID NO:1330), CGTGTCTCTA (SEQ ID NO: 1331), CTCGCGTGTC (SEQ ID NO: 1332),TAGTATCAGC (SEQ ID NO: 1333), TCTCTATGCG (SEQ ID NO: 1334), TGATACGTCT(SEQ ID NO: 1335), TACTGAGCTA (SEQ ID NO: 1336), CATAGTAGTG (SEQ ID NO:1337), CGAGAGATAC (SEQ ID NO: 1338), ATACGACGTA (SEQ ID NO: 1339),TCACGTACTA (SEQ ID NO: 1340), CGTCTAGTAC (SEQ ID NO: 1341), TCTACGTAGC(SEQ ID NO: 1342), TGTACTACTC (SEQ ID NO: 1343), ACGACTACAG (SEQ ID NO:1344), CGTAGACTAG (SEQ ID NO: 1345), TACGAGTATG (SEQ ID NO: 1346),TACTCTCGTG (SEQ ID NO: 1347), TAGAGACGAG (SEQ ID NO: 1348), TCGTCGCTCG(SEQ ID NO: 1349), ACATACGCGT (SEQ ID NO: 1350), ACACGACGACT (SEQ ID NO:1351), ACACGTAGTAT (SEQ ID NO: 1352), ACACTACTCGT (SEQ ID NO: 1353),ACGACACGTAT (SEQ ID NO: 1354), ACGAGTAGACT (SEQ ID NO: 1355),ACGCGTCTAGT (SEQ ID NO: 1356), ACGTACACACT (SEQ ID NO: 1357),ACGTACTGTGT (SEQ ID NO: 1358), ACGTAGATCGT (SEQ ID NO: 1359),ACTACGTCTCT (SEQ ID NO: 1360), ACTATACGAGT (SEQ ID NO: 1361),ACTCGCGTCGT (SEQ ID NO: 1362), AGTCGTGGTGT (SEQ ID NO: 1363),ATACTAGGTGT (SEQ ID NO: 1364), ACGAGTGGTGT (SEQ ID NO: 1365),ATACGTGGCGT (SEQ ID NO: 1366), AGTCTACGCGT (SEQ ID NO: 1367),ACTAGAGGCGT (SEQ ID NO: 1368), AGTGTGTGCGT (SEQ ID NO: 1369),ACACAGTGCGT (SEQ ID NO: 1370), ACGATCTGCGT (SEQ ID NO: 1371),AGAGACGGAGT (SEQ ID NO: 1372), ACTCGTAGAGT (SEQ ID NO: 1373), andACGACGGGAGT (SEQ ID NO: 1374).

As used herein, “third TCR or BCR C region specific sequence” is asequence specific to a C region of a TCR or a BCR, wherein the sequenceis present more downstream of a first TCR or BCR C region specificsequence and a second TCR or BCR C region specific sequence. It is asequence used for constituting a third TCR or BCR C region specificprimer. Specific examples thereof include the sequence of a specificportion in CM3-GS (SEQ ID NO: 1387), sequence of a specific portion inCA3-GS (SEQ ID NO: 1388), sequence of a specific portion in CG3-GS (SEQID NO: 1389), sequence of a specific portion in CD3-GS (SEQ ID NO:1390), and sequence in a specific portion in CE3-GS (SEQ ID NO: 1391)for BCRs, specific sequences in HuVaF or HuVbF in Table 6 (SEQ ID NOs:40-60) for TCRs and the like (corresponding to base number 51 to basenumber 73 of SEQ ID NO: 1376 (FIG. 18); base number 69 to base number 91of SEQ ID NO: 1377 (FIG. 18)). More specifically, such a primer sequencecan be set in a specific range such as the following, but the sequenceis not limited thereto. The first, second and third ranges can be set inthe entire range, but can be mutually determined. That is, when thefirst is set, the second is downstream thereof, and the third is furtherdownstream. Theoretically, it is understood that the primers only needto be downstream by the length of the primer.

α sequence of TCR: base number 51 to base number 73 of SEQ ID NO: 1376(FIG. 18)

β sequence of TCR: base number 69 to base number 91 of SEQ ID NO: 1377(FIG. 18)

γ sequence of TCR: base number 34 to base number 53 of SEQ ID NO: 1378(FIG. 19)

δ sequence of TCR: base number 61 to base number 78 of SEQ ID NO: 1379(FIG. 19)

IgM heavy chain sequence of BCR: base number 7 to base number 25 of SEQID NO: 1380 (FIG. 20)

IgA heavy chain sequence of BCR: base number 115 to base number 134 ofSEQ ID NO: 1381 (FIG. 21)

IgG heavy chain sequence of BCR: base number 109 to base number 129 ofSEQ ID NO: 1382 (FIG. 22)

IgD heavy chain sequence of BCR: base number 78 to base number 96 of SEQID NO: 1383 (FIG. 23)

IgE heavy chain sequence of BCR: base number 45 to base number 64 of SEQID NO: 1384 (FIG. 24)

Igκ chain constant region sequence of BCR: base number 75 to base number92 of SEQ ID NO: 1385 (FIG. 25)

Igλ chain sequence of BCR: base number 52 to base number 69 of SEQ IDNO: 1386 (FIG. 25) (this SEQ ID NO is also used for CM).

As used herein, “third TCR or BCR C region specific primer” is a primerused in the third PCR amplification reaction of the present invention,which is designed to have a sequence that is a complete match with theTCR or BCR C region in a sequence downstream to the sequence of thesecond TCR or BCR C region specific primer, but comprise a sequence thatis not homologous with other genetic sequences, and comprise amismatching base between subtypes downstream when amplified. The primerfurther comprises an adaptor sequence, a key sequence, and anidentification sequence. Specific examples thereof include, but are notlimited to, CM3-GS (SEQ ID NO: 7), CA3-GS (SEQ ID NO: 10), CG3-GS (SEQID NO: 13), CD3-GS (SEQ ID NO: 16) and CE3-GS (SEQ ID NO: 19). Anysequence that can be set as a third TCR or BCR C region specificsequence mentioned above can be used.

As used herein, “isotype” refers to IgM, IgA, IgG, IgE, IgD or the likethat belongs to the same type but have a difference sequence from oneanother. Isotypes can be denoted by using various abbreviations orsymbols of genes.

As used herein, “subtype” is a type within a type present in IgA and IgGfor BCRs. There are IgG1, IgG2, IgG3, and IgG4 for IgG and IgA1 and IgA2for IgA. It is also known to be present in β and γ chains for TCRs,which are TRBC1 and TRBC2 and TRGC1 and TRGC2, respectively.

As used herein, “complete match” refers to 100% identity when sequencesare compared to each other.

As used herein, “complete match with all C region allelic sequences ofthe same isotype” refers to a match with all sequences for C regionallelic sequences of the same isotype when aligned. Since all sequencesin a C region would never be identical even in the same isotype, use ofa sequence that is a complete match with all C region allelic sequencesof the same isotype would be advantageous for immediately determining anisotype when a sequence of an amplicon is determined.

As used herein, “unlikely to have a homodimer and intramolecular hairpinstructures” refers to a state of a nucleic acid molecule, especially acommon adaptor primer, where a sequence is unlikely to form a dimer dueto pairing with a complementary strand or the like or is unlikely toform a hairpin structure or the like due to pairing with a complementarystrand in a molecule. “Unlikely” allows for a degree of homodimer orhairpin that does not substantially affect the subsequent analysis,referring to, for example, tolerance of about 10% or less, 5% or less,1% or less, 0.5% or less, 0.1% or less, 0.05% or less, or 0.01% or lessof the whole. Such a sequence can be determined by using a knowntechnology in the art (Santa Lucia, J. Proc Natl Acad Sci USA, 95 (4):1460-1465. (1998), Bommarito et al., Nucleic Acids Res, 28 (9):1929-1934. (2000), Santa Lucia, J. Proc Natl Acad Sci USA, 95 (4):1460-1465. (1998), and von Ahsen et al., ClinChem, 47 (11): 1956-1961.(2001)), for example, by a commercially available computer program orthe like used in the Examples (CLC Main Workbench or Primer3).

As used herein, “not . . . have homodimer and intramolecular hairpinstructures” refers to a state of a nucleic acid molecule, especially acommon adaptor primer, where a sequence does not form a dimer due topairing with a complementary strand or the like nor form a hairpinstructure or the like due to pairing with a complementary strand in amolecule. Such a sequence can be determined by using a known technologyin the art (Santa Lucia, J. Proc Natl Acad Sci USA, 95(4): 1460-1465.(1998), Bommarito et al., Nucleic Acids Res, 28 (9): 1929-1934. (2000),Santa Lucia, J. Proc Natl Acad Sci USA, 95 (4): 1460-1465. (1998), andvon Ahsen et al., ClinChem, 47 (11): 1956-1961. (2001)), for example, bya commercially available computer program or the like used in theExamples (CLC Main Workbench or Primer3).

As used herein, a structure that “can stably form a double strand”refers to a nucleic acid molecule, especially a common adaptor primer,where a double strand, when formed with another nucleic acid moleculesuch as a template, forms the strand stably. Such stability can beassessed mainly by temperature, pH, melting temperature (Tm) calculatedfrom base composition, pHm or structure stabilizing energy(−ΔG_(37° C.)). Such a sequence can be determined by using a knowntechnology in the art (Santa Lucia, J. Proc Natl Acad Sci USA, 95(4):1460-1465. (1998), Bommarito et al., Nucleic Acids Res, 28(9):1929-1934. (2000), Santa Lucia, J. Proc Natl Acad Sci USA, 95(4):1460-1465. (1998), and von Ahsen et al., Clin Chem, 47(11): 1956-1961.(2001)), for example, by a commercially available computer program orthe like used in the Examples (CLC Main Workbench or Primer3).

As used herein, “not highly homologous” refers to a nucleic acidmolecule, especially a common adaptor primer, with a feature of havinghomology that is not high with all the TCR genetic sequences in adatabase in order to enhance identifiability. For sufficient analysis,the level of homology is preferably, for example, 80% or less, 70% orless, 60% or less, 50% or less, 40% or less, 30% or less, 25% or less,20% or less, 15% or less or 10% or less.

As used herein “same level of melting temperature (Tm)” refers to a DNAmelting temperature (Tm) of a sequence or a primer to be used beingsubstantially the same, which is a preferred condition for a suitablePCR amplification reaction. “Same level” may refer to Tm being ±15° C.or less, ±14° C. or less, ±13° C. or less, ±12° C. or less, ±11° C. orless, ±10° C. or less, ±9° C. or less, ±8° C. or less, ±7° C. or less,±6° C. or less, ±5° C. or less, ±4° C. or less, ±3° C. or less, ±2° C.or less, ±1° C. or less, or ±0.5° C. or less. The Examples are able tocarry out the present invention with a difference of 10.9° C. Thus, itis understood that about 15° C. or less is acceptable as the same level.Tm is a temperature at which 50% of a DNA molecule denatures to be asingle strand. Tm can be identified with a known technology in the art.For example, Tm can be found as follows (a) for an oligonucleotideshorter than 18b: Tm=(A+T)×2° C.+(G+C)×4° C., (b) for an oligonucleotidewith a length of 18 b or more: Tm=81.5+16.6 (log 10[Na+])+0.41 (%G+C)—(600/N), (*A: number of As in oligonucleotide, C: number of Cs inoligonucleotide, G: number of Gs in oligonucleotide, T: number of T inoligonucleotide, % G+C: % of G+C in oligonucleotide, N: length (mer) ofoligonucleotide, [Na+]: Na+ concentration in solution (M)).

As used herein, “base length suitable for amplification” refers to alength of a primer or a sequence that is used, which is suitable for anamplification reaction. Such a length can be found, for example, by acommercially available computer program or the like used in the Examples(CLC Main Workbench or Primer3). The documents such as the following canalso be referred: Santa Lucia, J. Proc Natl Acad Sci USA, 95 (4):1460-1465. (1998), Bommarito et al., Nucleic Acids Res, 28 (9):1929-1934. (2000), Santa Lucia, J. Proc Natl Acad Sci USA, 95 (4):1460-1465. (1998), and von Ahsen et al., Clin Chem, 47 (11): 1956-1961.(2001).

As used herein, “mismatch” refers to the presence of bases that are notidentical with each other when genetic sequences are aligned.

As used herein, “% GC (% guanine.cytosine content)” refers to thepercentage of G (guanine), C (cytosine) in a nucleic acid sequence withrespect to the entire base (including A (adenosine), T (thymine) and U(uracil)). High percentage thereof results in a higher meltingtemperature and is also related to gene density or band structure ofchromosomes.

As used herein, “set compatible with all TCR or BCR subclasses” refersto primers prepared in accordance with the descriptions herein for allknown subclasses (refers to TRBC1, TRBC2, or TRGC1, TRGC2 or the likefor TCRs, or IgG1, IgG2, IgG3 or IgG4 for IgG, IgA1 or IgA2 for IgA orthe like for BCRs) of a target TCR or BCR.

As used herein, “protein”, “polypeptide”, “oligopeptide” and “peptide”are used to have the same meaning and refer to an amino acid polymer ofany length. Such a polymer may be a branched or straight chain orannular. An amino may be a natural or non-natural or altered amino acid.The term may also encompass those assembled into a complex of aplurality of polypeptide chains. The term also encompasses natural orartificially altered amino acid polymers. Examples of such an alterationinclude disulfide bond formation, glycosylation, lipidation,acetylation, phosphorylation or any other manipulation or alteration(e.g., conjugation with a labeling component). The definition alsoencompasses, for example, a polypeptide comprising one or more analogsof an amino acid (e.g., including a non-natural amino acid, etc.),peptide-like compound (e.g., peptoid), and other known alterations inthe art.

As used herein, “amino acid” may be natural or non-natural as long asthe objective of the present invention is met.

As used herein, “polynucleotide”, “oligonucleotide” and “nucleic acid”are used in the same meaning and refer to a nucleotide polymer with anylength. The term also encompasses “oligonucleotide derivative” and“polynucleotide derivative”. “Oligonucleotide derivative” or“polynucleotide derivative” refers to an oligonucleotide or apolynucleotide, which has a bond between nucleotides that is not normalor includes a derivative of a nucleotide. They are interchangeably used.Specific examples of such an oligonucleotide include2′-O-methyl-ribonucleotide, oligonucleotide derivative with aphosphodiester bond in an oligonucleotide converted to aphosphorothioate bond, oligonucleotide derivative with a phosphodiesterbond in an oligonucleotide converted to an N3′-P5′ phosphoroamidatebond, oligonucleotide derivative with a ribose and phosphodiester bondin an oligonucleotide converted to a peptide nucleic acid bond,oligonucleotide derivative with uracil in an oligonucleotide substitutedwith C-5 propynyl uracil, oligonucleotide derivative with uracil in anoligonucleotide substituted with C-5 thiazole uracil, oligonucleotidederivative with cytosine in an oligonucleotide substituted with C-5propynyl cytosine, oligonucleotide derivative with cytosine in anoligonucleotide substituted with phenoxazine-modified cytosine,oligonucleotide derivative with ribose in an DNA substituted with2′-O-propylribose, oligonucleotide derivative with ribose in anoligonucleotide substituted with 2′-methoxyethoxyribose and the like.Unless specifically noted otherwise, a specific nucleic acid sequence isfurther intended to encompass conservatively altered variants (e.g.,degenerate codon substituted form) and complementary sequences thereofin addition to the explicitly shown sequences. Specifically, adegenerate codon substituted form can be obtained by creating a sequencein which the third position of one or more selected (or all) codons issubstituted with a mixed base and/or deoxyinosine residue (Batzer etal., Nucleic Acid Res. 19: 5081 (1991); Ohtsuka et al., J. Biol. Chem.260: 2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8: 91-98(1994)). As used herein, “nucleic acid” is interchangeably used withgene, cDNA, mRNA, oligonucleotide, and polynucleotide. As used herein,“nucleotide” may be natural or non-natural.

As used herein, “gene” refers to an agent defining a genotype. A gene isgenerally arranged in a certain order in a chromosome. A gene definingthe primary structure of a protein is referred to as a structural gene,and a gene affecting the expression thereof is referred to as aregulator gene. As used herein, “gene” may refer to a “polynucleotide”,“oligonucleotide” or “nucleic acid”. “Gene product” is a substanceproduced based on a gene and refers to a protein, mRNA or the like.

As used herein, “homology” of genes refers to the level of identity oftwo or more genetic sequences to one another. In general, having“homology” refers to having a high level of identity or similarity.Thus, a higher level of homology of two genes results in a higher levelof identity or similarity of sequences thereof. It is possible toexamine whether two types of genes are homologous by direct comparisonof sequences or by hybridization under stringent conditions for nucleicacids. When directly comparing two genetic sequences, the genes arehomologous typically when DNA sequences between the genetic sequencesare at least 50% identical, preferably at least 70% identical, and morepreferably at least 80%, 90%, 95%, 96%, 97%, 98% or 99% identical. Thus,as used herein, “homolog” or “homologous gene product” refers to aprotein in another species, preferably a mammal, which exerts the samebiological function as a protein constituent element of a complexfurther described herein.

An amino acid may be mentioned herein by a commonly known three lettersymbol thereof or a one letter symbol recommended by IUPAC-IUBBiochemical Nomenclature Commission. Similarly, a nucleotide may bementioned by a commonly recognized one letter code. Herein, comparisonof similarity, identity and homology of amino acid sequences and basesequences is calculated by using a default parameter with a sequenceanalysis tool BLAST. For instance, identity can be searched by usingNCBI's BLAST 2.2.9 (published on 5 Dec. 2004). The value of identityherein generally refers to a value from using the above-described BLASTto align sequences under default conditions. However, when a highervalue is output by changing a parameter, the highest value is consideredthe value of identity. When identity is assessed in a plurality ofregions, the highest value thereamong is considered the value ofidentity. Similarity is a numerical value that uses a similar amino acidfor the calculation in addition to identity.

As used herein, “polynucleotide that hybridizes under stringentconditions” refers to a conventional, well-known condition in the art.Such a polynucleotide can be obtained by using colony hybridization,plaque hybridization, southern blot hybridization or the like whileusing a polynucleotide selected from the polynucleotides of the presentinvention as a probe. Specifically, such a polynucleotide refers to apolynucleotide which can be identified by using a filter withimmobilized DNA derived from a colony or a plaque for hybridization at65° C. in the presence of 0.7-1.0 M NaCl, and then using a 0.1 to 2-foldconcentration SSC (saline-sodium citrate) solution (composition of anSSC solution with 1-fold concentration is 150 mM sodium chloride and 15mM sodium citrate) to wash the filter under the condition of 65° C.Hybridization can be performed in accordance with the method describedin experimental publications such as Molecular Cloning 2^(nd) ed.,Current Protocols in Molecular Biology, Supplement 1-38, DNA Cloning 1:Core Techniques, A Practical Approach, Second Edition, Oxford UniversityPress (1995). In this regard, a sequence comprising only an A sequenceor only a T sequence is preferably excluded from a sequence thathybridizes under stringent conditions. Thus, the polypeptide used in thepresent invention (e.g., transthyretin and the like) encompassespolypeptides encoded by a nucleic acid molecule that hybridizes understringent conditions to a nucleic acid molecule encoding a polypeptideparticularly described in the present invention. The low stringencyconditions include hybridization for 18-hours at 40° C. in a buffersolution comprising 35% formamide, 5×SSC, 50 mM Tris-HCl (pH 7.5), 5 mMEDTA, 0.02% polyvinylpyrrolidone (PVP), 0.02% BSA, 100 μg/ml denaturedsalmon sperm DNA, and 10% (w/v) dextran sulfate, washing for 1-5 hoursat 55° C. in a buffer solution consisting of 2×SSC, 25 mM Tris-HCl (pH7.4), 5 mM EDTA, and 0.1% SDS, and washing for 1.5 hours at 60° C. in abuffer solution consisting of 2×SSC, 25 mM Tris-HCl (pH 7.4), 5 mM EDTA,and 0.1% SDS.

As used herein, “purified” substance or biological agent (e.g., nucleicacid, protein or the like) refers to a biological agent having at leastsome of the agents that are naturally accompanied therewith removed.Thus, purity of a biological agent in a purified biological agent isgenerally higher than that of a normal condition of the biological agent(i.e., concentrated). The term “purified” as used herein preferablyrefers to the presence of at least 75 wt. %, more preferably at least 85wt. %, still more preferably at least 95 wt. %, and the most preferablyat least 98 wt. % of biological agents of the same type. A substanceused in the present invention is preferably a “purified” substance.

As used herein, “corresponding” amino acid or nucleic acid” refers to anamino acid or nucleotide which has, or is expected to have, actionsimilar to a determined amino acid or nucleotide in a polypeptide orpolynucleotide that is a baseline of comparison in a certain polypeptidemolecule or polynucleotide molecule, and particularly for an enzymemolecule refers to an amino acid that is at the same position in anactive site and provides the same contribution to catalytic activity.For instance, for antisense molecules, this may be a similar portion inan ortholog corresponding to a specific portion of the antisensemolecule. A corresponding amino acid may be a specific amino acid thathas undergone cysteinylation, glutathionylation S-S bond formation,oxidation (e.g., oxidation of methionine side chain), formylation,acetylation, phosphorylation, glycosylation, myristylation or the like.Alternatively, a corresponding amino acid may be an amino acidresponsible for dimerization. Such a “corresponding” amino acid ornucleic acid may be a region or domain (e.g., V region, D region or thelike) over a certain range. Thus, such a region or domain is called a“corresponding” region or domain herein.

As used herein, “fragment” refers to a polypeptide or polynucleotidewith a sequence length of 1 to n−1 with respect to the full lengthpolypeptide or polynucleotide (with length n). The length of a fragmentcan be appropriately changed in accordance with the objective. Examplesof the lower limit of such a length include 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 30, 40, 50 and more amino acids for a polypeptide. Lengthsrepresented by an integer that is not specifically listed herein (e.g.,11 and the like) also can be suitable as a lower limit. Further,examples of length include 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50,75, 100, and more nucleotides for a polynucleotide. Lengths representedby an integer that is not specifically listed herein (e.g., 11 and thelike) also can be suitable as a lower limit. As used herein, such afragment is understood to be within the scope of the present inventionwhen a full length version functions as a marker, as along as thefragment itself also functions as a marker.

The term “activity” according to the present invention refers to afunction of a molecule in the broadest sense herein. Activity, althoughnot intended to be limiting, generally includes a biological function,biochemical function, physical function, therapeutic activity,diagnostic activity and chemical function of a molecule. Examples ofactivity include enzymatic activity, an ability to interact with anothermolecule, an ability to activate, promote, stabilize, inhibit, suppress,or destabilize a function of another molecule, stability, and ability tolocalize at a specific position in a cell. When applicable, the termalso relates to a function of a protein complex in the broadest sense.

As used herein “expression” of a gene, polynucleotide, polypeptide orthe like refers to the gene or the like being affected by a certainaction in vivo to have another form. Preferably, expression refers to agene, polynucleotide or the like being transcribed and translated to bein a form of a polypeptide, but being transcribed to make an mRNA canalso be one form of expression. More preferably, such a polypeptide formcan be those processed after translation (derivative as referred toherein).

A functional equivalent such as an isotype of a molecule such as IgGused in the present invention can be found by searching a database orthe like. As used herein, “search” refers to utilizing a certain nucleicacid base sequence electronically, biologically, or by another method,preferably electronically, to find another nucleic acid base sequencehaving a specific function and/or property. Examples of electronicsearch include, but are not limited to, BLAST (Altschul et al.,J.Mol.Biol. 215: 403-410 (1990)), FASTA (Pearson & Lipman, Proc. Natl.Acad. Sci., USA 85: 2444-2448 (1988)), Smith and Waterman method (Smithand Waterman, J. Mol. Biol. 147: 195-197 (1981)), Needleman and Wunschmethod (Needleman and Wunsch, J. Mol. Biol. 48: 443-453 (1970)) and thelike. BLAST is typically used. Examples of biological search include,but are not limited to, stringent hybridization, a macroarray with agenomic DNA applied to a nylon membrane or the like or a microarray witha genomic DNA applied to a glass plate (microarray assay), PCR, in situhybridization and the like. Herein, a gene used in the present inventionis intended to include corresponding genes identified by such electronicsearch or biological search.

As a functional equivalent of the present invention, it is possible touse an amino acid sequence with one or more amino acid insertions,substitutions or deletions, or addition to one or both ends. As usedherein, “one or more amino acid insertions, substitutions or deletions,or addition to one or both ends” in an amino acid sequence refers to analteration with a substitution of multiple amino acids or the like tothe extent that can occur naturally by a well-known technical methodsuch as site-directed mutagenesis or a natural mutation. An alteredamino acid sequence of a molecule can have, for example, 1-30,preferably 1-20, more preferably 1-9, still more preferably 1-5, andespecially preferably 1-2 amino acid insertion, substitution or deletionor addition to one or both ends. An altered amino acid sequence may bean amino acid sequence having one or more (preferably 1 or several, or1, 2, 3 or 4) conservative substitutions in an amino acid sequence of amolecule such as CD98. “Conservative substitution” refers to asubstitution of one or more amino acid residues with another chemicallysimilar amino acid residue so as not to substantially alter a functionof a protein. Examples thereof include cases where a hydrophobic residueis substituted with another hydrophobic residue, cases where a polarresidue is substituted with another polar residue having the same chargeand the like. Functionally similar amino acids that can be substitutedin this manner are known in the art for each amino acid. Specificexamples include alanine, valine, isoleucine, leucine, proline,tryptophan, phenylalanine, methionine and the like for nonpolar(hydrophobic) amino acids, glycine, serine, threonine, tyrosine,glutamine, asparagine, cysteine and the like for polar (neutral) aminoacids. Examples of positively charged (basic) amino acid includearginine, histidine, lysine and the like. Further, examples of anegatively-charged (acidic) amino acid include aspartic acid, glutamicacid and the like.

As used herein, “marker (substance, protein or gene (nucleic acid))”refers to a substance that can be an indicator for tracking whether atarget is in or in risk of being in a certain condition (e.g., normalcell state, transformed state, diseased state, disorder state, level ofor presence of proliferation capability or differentiated state or thelike). Examples of such a marker include genes (nucleic acid=DNA level),gene products (mRNA, protein, and the like), metabolites, enzymes andthe like. In the present invention, detection, diagnosis, preliminarydetection, prediction or prediagnosis of a certain state (e.g., diseasesuch as differentiation disorder) can be materialized by using an agentor means specific to a marker associated with such a state, or acomposition, kit or system comprising the same or the like. As usedherein, “gene product” refers to a protein or mRNA encoded by a gene.

As used herein, “subject” refers to a target subjected to diagnosis,detection or the like of the present invention (e.g., an organism suchas a human or an organ or cell extracted from an organism or the like).

As used herein, “sample” refers to any substance obtained from a subjector the like. For example, an eye cell and the like are encompassed.Those skilled in the art can appropriately select a preferred samplebased on the descriptions herein.

As used herein, “agent” is used broadly and may be any substance orother elements (e.g., energy, radiation, heat, electricity and otherforms of energy) as long as the intended objective can be achieved.Examples of such a substance include, but are not limited to, protein,polypeptide, oligopeptide, peptide, polynucleotide, oligonucleotide,nucleotide, nucleic acid (including for example DNAs such as cDNA andgenomic DNA, RNAs such as mRNA), polysaccharide, oligosaccharide, lipid,organic small molecule (e.g., hormone, ligand, information transmittingsubstance, organic small molecule, molecule synthesized by combinatorialchemistry, small molecule that can be used as medicine (e.g., smallmolecule ligand and the like) and a composite molecule thereof). Typicalexamples of an agent specific to a polynucleotide include, but are notlimited to, a polynucleotide having complementarity with a certainsequence homology (e.g., 70% or greater sequence identity) to a sequenceof the polynucleotide, polypeptide such as a transcription factor thatbinds to a promoter region and the like. Typical examples of an agentspecific to a polypeptide include, but are not limited to, an antibodydirected specifically to the polypeptide or a derivative or analogthereof (e.g., single strand antibody), a specific ligand or receptorwhen the polypeptide is a receptor or ligand, a substrate when thepolypeptide is an enzyme and the like.

As used herein, “detecting agent” broadly refers to all agents capableof detecting a target of interest.

As used herein, “diagnostic agent” broadly refers to all agents capableof diagnosing a condition of interest (e.g., disease or the like).

The detecting agent of the present invention may be a complex orcomposite molecule in which another substance (e.g., label or the like)is bound to a portion enabling detection (e.g., antibody or the like).As used herein, “complex” or “composite molecule” refers to anyconstruct comprising two or more portions. For instance, when oneportion is a polypeptide, the other portion may be a polypeptide orother substances (e.g., sugar, lipid, nucleic acid, other carbohydrateor the like). As used herein, two or more constituent portions of acomplex may be bound by a covalent bond or any other bond (e.g.,hydrogen bond, ionic bond, hydrophobic interaction, Van der Waals forceor the like). When two or more portions are polypeptides, the complexmay be called a chimeric polypeptide. Thus, “complex” as used hereinincludes molecules formed by linking a plurality of types of moleculessuch as a polypeptide, polynucleotide, lipid, sugar, or small molecule.

As used herein, “interaction” refers, for two substances, to applying aforce (e.g., intermolecular force (Van der Waals force), hydrogen bond,hydrophobic interaction, or the like) between one substance and theother substance. Generally, two substances that have interacted are in aconjugated or bound state.

As used herein, the term “bond” refers to a physical or chemicalinteraction between two substances or between combinations thereof. Abond includes an ionic bond, non-ionic bond, hydrogen bond, Van derWaals bond, hydrophobic interaction and the like. A physical interaction(bond) may be direct or indirect. Indirect physical interaction (bond)is mediated by or is due to an effect of another protein or compound. Adirect bond refers to an interaction, which does not occur through ordue to an effect of another protein or compound and does notsubstantially involve another intermediate. The degree of expression ofthe marker of the present invention or the like can be measured bymeasuring a bond or interaction.

Thus, an “agent” (or detecting agent or the like) that “specifically”interacts (or binds) to a biological agent such as a polynucleotide or apolypeptide as used herein encompasses agents with affinity to thebiological agent such as a polynucleotide or polypeptide that istypically similar or higher, preferably significantly (e.g.,statistically significantly) higher, than affinity to other unrelatedpolynucleotide or polypeptide (especially those with less than 30%identity). Such affinity can be measured by, for example, hybridizationassay, binding assay or the like.

As used herein, “specific” interaction (or bond) of a first substance oragent with a second substance or agent refers to the first substance oragent interacting with (or binding to) the second substance or agent ata higher level of affinity than to substances or agents other than thesecond substance or agent (especially other substances or agents in asample comprising the second substance or agent). Examples of aninteraction (or bond) specific to a substance or agent include, but arenot limited to, a ligand-receptor reaction, hybridization in a nucleicacid, antigen-antibody reaction in a protein, enzyme-substrate reactionand the like, and when both a nucleic acid and a protein are involved, areaction between a transcription factor and a binding site of thetranscription factor and the like, protein-lipid interaction, nucleicacid-lipid interaction and the like. Thus, when substances or agents areboth nucleic acids, a first substance or agent “specificallyinteracting” with a second substance or agent encompasses the firstsubstance or agent having at least partial complementarity to the secondsubstance or agent. Further, examples of a first substance or agent“specifically” interacting with (or binding to) a second substance oragent when substances or agents are both proteins includes, but are notlimited to, interaction by an antigen-antibody reaction, interaction bya receptor-ligand reaction, enzyme-substrate interaction and the like.When two types of substances or agents include a protein and a nucleicacid, a first substance or agent “specifically” interacting with (orbinding to) a second substance or factor encompasses an interaction (ora bond) between a transcription factor and a binding region of a nucleicacid molecule targeted by the transcription factor.

As used herein, “detection” or “quantification” of polynucleotide orpolypeptide expression can be accomplished by using a suitable methodincluding, for example, an immunological measuring method andmeasurement of mRNAs, including a bond or interaction to a markerdetecting agent. However, measurement can be performed with the amountof PCR product in the present invention. Examples of a molecularbiological measuring method include northern blot, dot blot, PCR and thelike. Examples of an immunological measurement method include ELISAusing a microtiter plate, RIA, fluorescent antibody method, luminescenceimmunoassay (LIA), immunoprecipitation (IP), single radialimmunodiffusion (SRID), turbidimetric immunoassay (TIA), western blot,immunohistochemical staining and the like. Further, examples of aquantification method include ELISA, RIA and the like. Quantificationmay also be performed by a gene analysis method using an array (e.g.,DNA array, protein array). DNA arrays are outlined extensively in (Ed.by Shujunsha, Saibo Kogaku Bessatsu “DNA Maikuroarei to Saishin PCR ho”[Cellular engineering, Extra issue, “DNA Microarrays and Latest PCRMethods”]. Protein arrays are discussed in detail in Nat Genet. 2002December; 32 Suppl: 526-32. Examples of a method of analyzing geneexpression include, but are not limited to, RT-PCR, RACE, SSCP,immunoprecipitation, two-hybrid system, in vitro translation and thelike, in addition to the methods discussed above. Such additionalanalysis methods are described in, for example, Genomu Kaiseki JikkenhoNakamura Yusuke Labo Manyuaru [Genome analysis experimental methodYusuke Nakamura Lab Manual], Ed. by Yusuke Nakamura, Yodosha (2002) andthe like. The entirety of the descriptions therein is incorporatedherein by reference.

As used herein, “amount of expression” refers to the amount ofpolypeptide, mRNA or the like expressed in a cell, tissue or the like ofinterest. Examples of such an amount of expression include amount ofexpression of polypeptide of the present invention at a protein levelassessed by any suitable method including an immunological measurementmethod such as ELISA, RIA, fluorescent antibody method, western blot,and immunohistochemical staining by using the antibody of the presentinvention, and the amount of expression of the polypeptide used in thepresent invention at an mRNA level assessed by any suitable methodincluding a molecular biological measuring method such as northern blot,dot blot, and PCR. “Change in amount of expression” refers to anincrease or decrease in the amount of expression of the polypeptide usedin the present invention at a protein level or mRNA level assessed byany suitable method including the above-described immunologicalmeasuring method or molecular biological measuring method. A variety ofdetection or diagnosis based on a marker can be performed by measuringthe amount of expression of a certain marker.

As used herein, “decrease” or “suppression” of activity or expressionproduct (e.g., protein, transcript (RNA or the like)) or synonymsthereof refers to: a decrease in the amount, quality or effect of aspecific activity, transcript or protein; or activity that decreases thesame.

As used herein, “increase” or “activation” of activity or expressionproduct (e.g., protein, transcript (RNA or the like)) or synonymsthereof refers to: an increase in the amount, quality or effect of aspecific activity, transcript or protein; or activity that increases thesame.

Thus, it is understood that activity of an immune system can be detectedor screened by using a regulatory ability such as decrease, suppression,increase or activation of the marker of the present invention as anindicator.

As used herein, “means” refers to anything that can be a tool foraccomplishing an objective (e.g., detection, diagnosis, therapy). Inparticular, “means for selective recognition (detection)” as used hereinrefers to means capable of recognizing (detecting) a certain subjectdifferently from others.

The present invention is useful as an indicator of an immune systemcondition. Thus, the present invention can be used to identify anindicator of an immune system condition and know the condition of adisease.

As used herein, “(nucleic acid) primer” refers to a substance requiredfor initiating a reaction of a polymeric compound to be synthesized in apolymer synthesizing enzyme reaction. A synthetic reaction of a nucleicacid molecule can use a nucleic acid molecule (e.g., DNA, RNA or thelike) complementary to a portion of a sequence of a polymeric compoundto be synthesized. A primer can be used herein as a marker detectingmeans.

Examples of a nucleic acid molecule generally used as a primer includethose having a nucleic acid sequence with a length of at least 8contiguous nucleotides, which is complementary to a nucleic acidsequence of a gene of interest (e.g., marker of the present invention).Such a nucleic acid sequence may be a nucleic acid sequence with alength of preferably at least 9 contiguous nucleotides, more preferablyat least 10 contiguous nucleotides, still more preferably at least 11contiguous nucleotides, at least 12 contiguous nucleotides, at least 13contiguous nucleotides, at least 14 contiguous nucleotides, at leastcontiguous nucleotides, at least 16 contiguous nucleotides, at least 17contiguous nucleotides, at least contiguous nucleotides, at least 19contiguous nucleotides, at least 20 contiguous nucleotides, at leastcontiguous nucleotides, at least 30 contiguous nucleotides, at least 40contiguous nucleotides, or at least 50 contiguous nucleotides. A nucleicacid sequence used as a probe comprises a nucleic acid sequence that isat least 70% homologous, more preferably at least 80% homologous, stillmore preferably at least 90% homologous, or at least 95% homologous tothe aforementioned sequence. A sequence suitable as a primer may varydepending on the property of a sequence intended for synthesis(amplification). However, those skilled in the art are capable ofdesigning an appropriately primer in accordance with an intendedsequence. Design of such a primer is well known in the art, which may beperformed manually or by using a computer program (e.g., LASERGENE,PrimerSelect, or DNAStar).

The primers according to the present invention can be used as a primerset consisting of two or more types of the primers.

The primers and primer set according to the present invention can beused as primers and primer set in accordance with a common method in aknown method of detecting a gene of interest by utilizing a nucleic acidamplification method such as PCR, RT-PCR, real-time PCR, in situ PCR, orLAMP.

The primer set according to the present invention can be selected suchthat a nucleotide sequence of a protein of interest such as a moleculeof a T cell receptor can be amplified by a nucleic acid amplificationmethod such as PCR. Nucleic acid amplification methods are well known.Selection of a primer pair in a nucleic acid amplification method isevident to those skilled in the art. For instance, primers can beselected in PCR such that one of the two primers (primer pair) is pairedwith the plus strand of a double-stranded DNA of a protein of interestsuch as a T cell receptor molecule while the other primer is paired withthe minus strand of the double-stranded DNA and the chain extended byone of the primers is paired with the other primer. The primer of thepresent invention can be chemically synthesized based on the nucleotidesequence disclosed herein. Preparation of a primer is well known and canbe carried out in accordance with, for example, “Molecular Cloning, ALaboratory Manual 2^(nd) ed.” (Cold Spring Harbor Press (1989)),“Current Protocols in Molecular Biology” (John Wiley & Sons(1987-1997)).

As used herein, “probe” refers to a substance that can be means forsearch, which is used in a biological experiment such as in vitro and/orin vivo screening. Examples thereof include, but are not limited to, anucleic acid molecule comprising a specific base sequence, a peptidecomprising a specific amino acid sequence, a specific antibody, afragment thereof and the like. As used herein, a probe can be used asmarker detecting means.

A nucleic acid molecule generally used as a probe includes those havinga nucleic acid sequence with a length of at least 8 contiguousnucleotides, which is homologous or complementary to a nucleic acidsequence of a gene of interest. Such a nucleic acid sequence may be anucleic acid sequence with a length of preferably at least 9 contiguousnucleotides, more preferably at least 10 contiguous nucleotides, stillmore preferably at least 11 contiguous nucleotides, at least 12contiguous nucleotides, at least 13 contiguous nucleotides, at least 14contiguous nucleotides, at least 15 contiguous nucleotides, at leastcontiguous nucleotides, at least 25 contiguous nucleotides, at least 30contiguous nucleotides, at least contiguous nucleotides, or at least 50contiguous nucleotides. A nucleic acid sequence used as a probecomprises a nucleic acid sequence that is at least about 70% homologous,more preferably at least about 80% homologous, still more preferably atleast about 90% homologous, or at least about 95% homologous with theaforementioned sequence.

In one embodiment, the detecting agent of the present invention may belabeled. Alternatively, the detecting agent of the present invention maybe bound to a tag.

As used herein, “label” refers to an entity (e.g., substance, energy,electromagnetic wave or the like) for distinguishing a molecule orsubstance of interest from others. Such a method of labeling includes RI(radioisotope) method, fluorescence method, biotin method,chemiluminescent method and the like. When a plurality of markers of thepresent invention or agents or means for capturing the same are labeledby a fluorescence method, labeling is performed with labeling substanceshaving different fluorescent emission maximum wavelengths. It ispreferable that the difference in fluorescent emission maximumwavelengths is 10 nm or greater. When labeling a ligand, any label thatdoes not affect the function can be used. However, Alexa™Fluor isdesirable as a fluorescent substance. Alexa™Fluor is a water-solublefluorescent dye obtained by modifying coumarin, rhodamine, fluorescein,cyanine or the like. This is a series compatible with a wide range offluorescence wavelengths. Relative to other fluorescent dyes for thecorresponding wavelength, Alexa™Fluor is very stable, bright and has alow level of pH sensitivity. Combinations of fluorescent dyes withfluorescence maximum wavelength of 10 nm or greater include acombination of Alexa™555 and Alexa™633, combination of Alexa™488 andAlexa™555 and the like. When a nucleic acid is labeled, any substancecan be used that can bind to a base portion thereof. However, it ispreferable to use a cyanine dye (e.g., Cy3, Cy5 or the like of theCyDye™ series), rhodamine 6G reagent, N-acetoxy-N2-acetylaminofluorene(AAF), AAIF (iodine derivative of AAF) or the like. Examples of afluorescent substance with a difference in fluorescent emission maximumwavelengths of nm or greater include a combination of Cy5 and arhodamine 6G reagent, a combination of Cy3 and fluorescein, acombination of a rhodamine 6G reagent and fluorescein and the like. Thepresent invention can utilize such a label to alter a subject ofinterest to be detectable by the detecting means to be used. Suchalteration is known in the art. Those skilled in the art canappropriately carry out such a method in accordance with the label andsubject of interest.

As used herein, “tag” refers to a substance for distinguishing amolecule by a specific recognition mechanism such as receptor-ligand, ormore specifically, a substance serving the role of a binding partner forbinding a specific substance (e.g., having a relationship such asbiotin-avidin or biotin-streptavidin). A tag can be encompassed in thescope of “label”. Accordingly, a specific substance to which a tag isbound can distinguish the specific substance by a contact with asubstrate, to which a binding partner of a tag sequence is bound. Such atag or label is well known in the art. Typical tag sequences include,but are not limited to, myc tag, His tag, HA, Avi tag and the like. Sucha tag may be bound to the marker or marker detecting agent of thepresent invention.

In this regard, “test sample” only needs to be a cell of interest or asubstance derived therefrom, which is considered to comprise an elementenabling gene expression.

As used herein, “diagnosis” refers to identifying various parametersassociated with a disease, disorder, condition or the like in a subjectto determine the current or future state of such a disease, disorder, orcondition. The condition in the body can be examined by using themethod, apparatus, or system of the present invention. Such informationcan be used to select and determine various parameters of a formulationor method for treatment or prevention to be administered, disease,disorder, or condition in a subject or the like. As used herein,“diagnosis” when narrowly defined refers to diagnosis of the currentstate, but when broadly defined includes “early diagnosis”, “predictivediagnosis”, “prediagnosis” and the like. Since the diagnostic method ofthe present invention in principle can utilize what comes out from abody and can be conducted away from a medical practitioner such as aphysician, the present invention is industrially useful. In order toclarify that the method can be conducted away from a medicalpractitioner such as a physician, the term as used herein may beparticularly called “assisting” “predictive diagnosis, prediagnosis ordiagnosis”.

The formulation procedure for a diagnostic agent or the like of thepresent invention as a medicament or the like is known in the art. Theprocedure is described, for example, in Japanese Pharmacopoeia, theUnited States Pharmacopeia, pharmacopeia of other countries, or thelike. Thus, those skilled in the art can determine the amount to be usedwithout undue experimentation from the descriptions herein.

As used herein, “complete match with all C region allelic sequences ofthe same isotype” refers to a match with all sequences for C regionallelic sequences of the same isotype when aligned. Since all sequencesin a C region would never be identical even in the same isotype, use ofa sequence that is a complete match with all C region allelic sequencesof the same isotype would be advantageous for immediately determining anisotype when a sequence of an amplicon is determined.

As used herein, “trimming” refers to removal of an unsuitable portion ingene analysis. Trimming is performed by removing low quality regionsfrom both ends of a read, partial sequence of an artificial nucleic acidsequence imparted in an experimental procedure, or both. Trimming can beperformed with a software known in the art or by referring to references(for example, cutadapt hypertext transfer protocol colon journal dotembnet dot org/index dot php/embnetjournal/article/view/200/(EMBnet dotjournal, 2011); fastq-mcf Aronesty E., The Open Bioinformatics Journal(2013) 7, 1-8 (DOI: 10.2174/1875036201307010001); and fastx-toolkithypertext transfer protocol colon hannonlab dot cshl dotedu/fastx_toolkit/(2009)). For an adaptor sequence or an artificialnucleic acid sequence, trimming is preferably accomplished by the stepsof: deleting low quality regions from both ends of a read; deleting aregion matching 10 bp or more with an adaptor sequence from both ends ofthe read; and using the read as a high quality read in analysis when theremaining length is 200 bp or more (TCR) or 300 bp or more (BCR).

As used herein, “suitable length” refers to a length that is adapted toanalysis when analysis of an alignment or the like is performed in thegene analysis of the present invention. For example, such a length canbe determined to be a length including 100 bases toward a D region on aV region from a sequencing initiation position on a C region. In thepresent invention, examples of a suitable length include, but are notlimited to, 200 nucleotides or longer, preferably 250 nucleotides orlonger for TCRs and 300 nucleotides or longer and preferably 350nucleotides or longer for BCRs.

As used herein, “input sequence set” refers to a set of target sequencesof TCR or BCR repertoire analysis in the gene analysis of the presentinvention.

As used herein, “gene region” refers to each of V region, D region, Jregion, C region and the like. Such gene regions are known in the artand can be appropriately determined by referring to a database or thelike. As used herein, “homology” of genes refers to the level ofidentity of 2 or more genetic sequences to one another. In general,having “homology” refers to having a high level of identity orsimilarity. Thus, a higher level of homology of two genes results in ahigher level of identity or similarity of sequences thereof. It ispossible to examine whether two types of genes are homologous by directcomparison of sequences or by hybridization under stringent conditionsfor nucleic acids. As used herein, “homology search” refers to searchfor homology. Preferably, homology can be searched in silico by using acomputer.

As used herein, “approximate” refers to having a high level of homologywhen homology search is performed. A software for homology search(BLAST, FASTA or the like), when executed, generally lists results inorder of high level of homology. Thus, approximation is possible byappropriately selecting a result that is highly ranked.

As used herein, “closest” refers to the highest level of homology whenhomology search is performed. When homology is searched with a software,the result displayed as ranking number one is selected.

As used herein, “reference allele” refers to a reference allele thatresults in a match in a reference database when homology search isperformed.

As used herein, “alignment” (or align) in bioinformatics refers tosimilar regions of a primary structure of a biomolecule such as DNA,RNA, or protein arranged in alignment to be identifiable or the act ofarranging. Alignment can provide a clue for understanding thefunctional, structural, or evolutionary relationship of sequences.

As used herein, “assign” refers to allocating specific information suchas a gene name, function, characteristic region (e.g., V region, Jregion or the like) to a certain sequence (e.g., nucleic acid sequence,protein sequence or the like). Specifically, this is accomplished byinputting or linking specific information to a certain sequence or thelike.

As used herein, “CDR3” refers to the third complementarity-determiningregion (CDR). In this regard, CDR is a region that directly contacts anantigen and undergoes a particularly large change among variableregions, and is referred to as a hypervariable region. Each variableregion of a light chain and a heavy chain has three CDRs (CDR1-CDR3) and4 FRs (FR1-FR4) surrounding the three CDRs. Since a CDR3 region isconsidered to be present across V region, D region and J region, it isconsidered as an important key for a variable region, and is thus usedas a subject of analysis.

As used herein, “front of CDR3 on a reference V region” refers to asequence corresponding to the front of CDR3 in a V region targeted bythe present invention.

As used herein, “end of CDR3 on a reference J” refers to a sequencecorresponding to the end of CDR3 in a J region targeted by the presentinvention.

As used herein, “condition tolerating random mutations to be scatteredthroughout” refers to any condition which results in random mutationsbeing scattered around. For example, such a condition is often expressedby the following condition for BLAST/FASTA optimal parameters: toleratesa maximum mismatch of 33% across the full length of an alignment; andtolerates a maximum nonconsecutive mismatch of 60% for any 30 bptherein. A functional equivalent such as an isotype of a molecule, e.g.IgG, used in the present invention can be found by searching a databaseor the like. As used herein, “search” refers to utilizing a certainnucleic acid base sequence electronically, biologically, or by anothermethod, preferably electronically, to find another nucleic acid basesequence having a specific function and/or property. Examples ofelectronic search include, but are not limited to, BLAST (Altschul etal., J. Mol. Biol. 215: 403-410 (1990)), FASTA (Pearson & Lipman, Proc.Natl. Acad. Sci., USA 85: 2444-2448 (1988)), Smith and Waterman method(Smith and Waterman, J. Mol. Biol. 147: 195-197 (1981)), Needleman andWunsch method (Needleman and Wunsch, J. Mol. Biol. 48: 443-453 (1970))and the like. BLAST is typically used. Examples of biological searchinclude, but are not limited to, stringent hybridization, a macroarraywith a genomic DNA applied to a nylon membrane or the like or amicroarray with a genomic DNA applied to a glass plate (microarrayassay), PCR, in situ hybridization and the like. Herein, a gene used inthe present invention is intended to include corresponding genesidentified by such electronic search or biological search.

Preferred Embodiments

Preferred embodiments of the present invention are described below. Theembodiments are provided for better understanding of the presentinvention. It is understood that the scope of the present inventionshould not be limited to the following descriptions. Further, it isapparent that those skilled in the art can readily make modificationswithin the scope of the present invention while referring to thedescriptions herein. For such embodiments, those skilled in the art canappropriately combine any embodiments.

(Unbiased Sample Amplification)

The present invention can use next generation sequencing techniques toprepare a sample for quantitative analysis of a repertoire of a variableregion of a T cell receptor (TCR) or a B cell receptor (BCR). Suchsequencing techniques can obtain a million or more reads from a sampleat a reasonable cost. Even a genotype that exists at a low frequency of1/1000000 or less can be detected by using these techniques in aspecific and unbiased manner. An unbiased amplification method foramplifying all different types of sequences of a specific portion of agene or a transcript from a sample derived from a DNA of blood, bonemarrow or the like is achieved.

In one aspect, the present invention provides a method of preparing asample for quantitative analysis of a repertoire of a variable region ofa T cell receptor (TCR) or B cell receptor (BCR) by genetic sequenceanalysis using a database, the method comprising the steps of: (1)synthesizing a complementary DNA by using an RNA sample derived from atarget cell as a template; (2) synthesizing a double strandedcomplementary DNA by using the complementary DNA as a template; (3)synthesizing an adaptor-added double stranded complementary DNA byadding a common adaptor primer sequence to the double strandedcomplementary DNA; (4) performing a first PCR amplification reaction byusing the adaptor-added double stranded complementary DNA, a commonadaptor primer consisting of the common adaptor primer sequence, and afirst TCR or BCR C region specific primer, wherein the first TCR or BCRC region specific primer is designed to comprise a sequence that issufficiently specific to a C region of interest of the TCR or BCR andnot homologous with other genetic sequences, and comprise a mismatchingbase between subtypes downstream when amplified; (5) performing a secondPCR amplification reaction by using a PCR amplicon of (4), the commonadaptor primer, and a second TCR or BCR C region specific primer,wherein the second TCR or BCR C region specific primer is designed tohave a sequence that is a complete match with the TCR or BCR C region ina sequence downstream the sequence of the first TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified; and (6) performing a third PCRamplification reaction by using a PCR amplicon of (5), an added commonadaptor primer in which a nucleic acid sequence of the common adaptorprimer comprises a first additional adaptor nucleic acid sequence, andan adaptor-added third TCR or BCR C region specific primer in which asecond additional adaptor nucleic acid sequence and a moleculeidentification (MID Tag) sequence are added to a third TCR or BCR Cregion specific sequence; wherein the third TCR or BCR C region specificprimer is designed to have a sequence that is a complete match with theTCR or BCR C region in a sequence downstream to the sequence of thesecond TCR or BCR C region specific primer, but comprise a sequence thatis not homologous with other genetic sequences, and comprise amismatching base between subtypes downstream when amplified, the firstadditional adaptor nucleic acid sequence is a sequence suitable forbinding to a DNA capturing bead and for an emPCR reaction, the secondadditional adaptor nucleic acid sequence is a sequence suitable for anemPCR reaction, and the molecule identification (MID Tag) sequence is asequence for imparting uniqueness such that an amplicon can beidentified.

Conventional methods could not achieve unbiasedness in the truest sense.However, the present invention can achieve unbiased amplification andperform accurate analysis. In regards to unbiasedness, SMART PCR or thelike is used in some cases. However, this method cannot achieve preciseunbiasedness. The reason therefor is the following. SMART PCR is amethod utilizing terminal transferase activity of a reversetranscriptase derived from Moloney Murine Leukemia Virus (MMLV). Thatis, when a reverse transcriptase reaches the 5′ terminal of an mRNA thatis a template in a complementary strand DNA synthesis reaction, asecondary reaction that mainly adds a C base to the 3′ terminal of thenewly synthesized complementary DNA is utilized. A primer (TS oligo)having a base sequence (GGG) complementary to the added base (CCC) atthe 3′ terminal is used to change a template upon a reversetranscription reaction, resulting in synthesis of a double strand. Thus,the method is known to have a disadvantage wherein addition reactions ofTS oligo continuously occur to form a TS oligo concatemer (Villanyi Z,Mai, A, Szabad J. Repeated template switching: Obstacles in cDNAlibraries and ways to avoid them. The open genomics journal, 2012, 5,1-6). Further, the method is known to have a disadvantage whereinprogression of polymerase is inhibited by TS oligo in a gene with asequence that is identical or similar to the 3′ side sequence of the TSoligo, resulting in producing a bias (Tang D T, Plessy C, Salimullah M,Suzuki A M, Calligaris R, Gustincich S, Carninci P. Suppression ofartifacts and barcode bias in high-throughput transcriptome analysesutilizing template switching. Nucleic Acids Res. 2013 Feb. 1; 41 (3):e44). In fact, it is reported that the correlation is low between astandard reverse transcription reaction or an in vitro transcription andSMART PCR using microarray analysis (Puskas L G, Zvara A, Hackler L Jr,Van Hummelen P.RNA amplification results in reproducible microarray datawith slight ratio bias. Biotechniques. 2002 June; 32(6): 1330-4, 1336,1338, 1340). Further, it is reported in repeated tests of each detectionmethod that SMART PCR exhibits lower reproducibility than the other twomethods (Puskas L G, et al., Biotechniques. 2002 June; 32(6): 1330-4,1336, 1338, 1340.).

In one embodiment where quantitative analysis is performed on arepertoire of a variable region of a BCR, the C region specific primercomprises a sequence that is a complete match with an isotype C regionof interest selected from the group consisting of IgM, IgA, IgG, IgE andIgD and has a sequence that is not homologous with other C regions.Preferably, the C region specific primer is a sequence that is acomplete match with one of the subtypes IgG1, IgG2, IgG3 and IgG4 or oneof IgA1 and IgA2 for IgA or IgG. In another embodiment wherequantitative analysis is performed on a repertoire of a variable regionof a TCR, the C region specific primer is a sequence that is a completematch with a C region of a chain of interest selected from the groupconsisting of α chain, β chain, γ chain and δ chain and is nothomologous with other C regions.

In another embodiment, it is preferable that a portion of a sequencethat is a complete match with all C region allelic sequences of the sameisotype in the database is selected for the C region specific primer.Such election of a complete match enables highly precise analysis.

In a preferred embodiment, the common adaptor primer is designed suchthat the primer is unlikely to have homodimer and intramolecular hairpinstructures and can stably form a double strand, and designed not to behighly homologous with all TCR genetic sequences in the database and tohave the same level of Tm as the C region specific primer. Examples ofsuch a common adaptor primer sequence include TAATACGACTCCGAATTCCC (SEQID NO: 2), GGGAATTCGG (P10EA; SEQ ID NO: 3) and the like.

In a preferred embodiment, the common adaptor primer designed not tohave homodimer and intramolecular hairpin structures and to havehomology with other genes comprising a BCR or TCR is selected. Examplesof such a common adaptor primer sequence include P20EA, P10EA and thelike.

In a specific embodiment, the common adaptor primer is P20EA and/orP10EA, and the sequence thereof is TAATACGACTCCGAATTCCC (P20EA; SEQ IDNO: 2), GGGAATTCGG (P10EA; SEQ ID NO: 3).

In a preferred embodiment, the first, second and third TCR or BCR Cregion specific primers are each independently a primer for BCRrepertoire analysis, the primer being selected to be a sequence that isa complete match with each isotype C region of IgM, IgG, IgA, IgD orIgE, and a complete match with subtypes for IgG and IgA, and nothomologous with other sequences comprised in the database, and comprisea mismatching base between subtypes downstream of the primer, andwherein the common adaptor primer sequence is designed such that thesequence has a base length suitable for amplification, is unlikely tohave homodimer and intramolecular hairpin structures, and is able tostably form a double strand, and designed not to be homologous withgenetic sequences other than a target sequence in the database (or nothomologous with other genes comprising a BCR or TCR other than a targetsequence) and designed to have the same level of Tm as the C regionspecific primer. Examples of such a sequence include, are not limitedto, P20EA (TAATACGACTCCGAATTCCC (SEQ ID NO: 2)) and P10EA (GGGAATTCGG(SEQ ID NO: 3)).

In a preferred embodiment, the first, second and third TCR C regionspecific primers are each independently a primer for TCR or BCRrepertoire analysis, each primer being selected to be a sequence that isa complete match with 1 type of α chain (TRAC), 2 types of β chains(TRBCO1 and TRBCO2), 2 types of γ chains (TRGC1 and TRGC2), and one typeof δ chain (TRDC1) and is not homologous with other sequences comprisedin the database, and to comprise a mismatching base between subtypesdownstream of the primer, wherein the common adaptor primer sequence isdesigned such that the sequence has a base length suitable foramplification, is unlikely to have homodimer and intramolecular hairpinstructures, and is able to stably form a double strand, and designed notto be highly homologous with all TCR genetic sequences in the databaseand to have the same level of Tm as the C region specific primer.Examples of such a sequence include, but are not limited to, P20EA(TAATACGACTCCGAATTCCC (SEQ ID NO: 2)), P10EA (GGGAATTCGG (SEQ ID NO:3)).

In a preferred embodiment, the third TCR or BCR C region specific primeris set in a region that is up to about 150 bases from the 5′ terminalside of a C region, and the first TCR or BCR C region specific primerand the second TCR or BCR C region specific primer are set between the5′ terminal side of a C region to about 300 bases.

In a preferred embodiment, the first, second and third TCR or BCR Cregion specific primers are each independently for BCR quantitativeanalysis, wherein separate specific primers are set to 5 types ofisotype sequences, and the primers are designed to completely match atarget sequence and ensure a mismatch of 5 bases or more for otherisotypes and are designed to be a complete match with all subtypes suchthat one type of primer is compatible with each similar IgG subtype(IgG1, IgG2, IgG3 and IgG4) or IgA subtype (IgA1 and IgA2). Examples ofsuch a sequence include the following that are used in the Examples, butare not limited to: CM1 (SEQ ID NO: 5), CA1 (SEQ ID NO: 8), CG1 (SEQ IDNO: 11), CD1 (SEQ ID NO: 14), CE1 (SEQ ID NO: 17), CM2 (SEQ ID NO: 6),CA2 (SEQ ID NO: 9), CG2 (SEQ ID NO: 12), CD2 (SEQ ID NO: 15), CE2 (SEQID NO: 18), CM3-GS (SEQ ID NO: 7), CA3-GS (SEQ ID NO: 10), CG3-GS (SEQID NO: 13), CD3-GS (SEQ ID NO: 16) and CE3-GS (SEQ ID NO: 19).

In a preferred embodiment, parameters in primer design are set to: abase sequence length of 18-22 bases; a melting temperature of 54-66° C.;and % GC (% guanine.cytosine content) of 40-65%. Preferably, in additionto such parameters, parameters are set to: a base sequence length of18-22 bases; a melting temperature of 54-66° C.; and % GC (%guanine.cytosine content) of 40-65%; a self-annealing score of 26; aself-end annealing score of 10; and a secondary structure score of 28(for a Roche sequencer used in the Examples). Although these preferredvalues of base sequence length and the like may vary depending on thesequencer model, those skilled in the art can appropriately setparameters in accordance with the sequencer model.

In a preferred embodiment, conditions for a method of determiningsequences of the first, second and third TCR or BCR C region specificprimers include the following: 1. a plurality of subtype sequencesand/or allelic sequences are uploaded into a base sequence analysissoftware and aligned; 2. a primer designing software is used to searchfor a plurality of primers satisfying a parametric condition in a Cregion; 3. a primer in a region without a mismatching base in thealigned sequences in 1 is selected; and 4. the presence of a pluralityof mismatching sequences for each subtype and/or allele downstream ofthe primer determined in 3 is confirmed, and if there is no suchsequence, a primer is searched further upstream, which is optionallyfurther repeated.

In a preferred embodiment, the first TCR or BCR C region specific primeris set in a position at bases 41-300 with a first base of a first codonof a C region sequence produced by splicing as a baseline, the secondTCR or BCR C region specific primer is set in a position at bases 21-300with said first base as the baseline, and the third TCR or BCR C regionspecific primer is set in a position within 150 bases or less with saidfirst base as the baseline, and the positions comprise a mismatchingsite in a subtype and/or allele.

In a preferred embodiment, the first TCR or BCR C region specific primercan have, but is not limited to, the following structure: CM1 (SEQ IDNO: 5), CA1 (SEQ ID NO: 8), CG1 (SEQ ID NO: 11), CD1 (SEQ ID NO: 14),CE1 (SEQ ID NO: 17), CA1 (SEQ ID NO: 35), CB1 (SEQ ID NO: 37) or thelike.

In a preferred embodiment, the second TCR or BCR C region specificprimer can have, but is not limited to, the following structure: CM2(SEQ ID NO: 6), CA2 (SEQ ID NO: 9), CG2 (SEQ ID NO: 12), CD2 (SEQ ID NO:15), CE2 (SEQ ID NO: 18), CA2 (SEQ ID NO: 35), and CB2 (SEQ ID NO: 37)or the like.

In a preferred embodiment, the third TCR or BCR C region specific primercan have, but is not limited to, the following structure: CM3-GS (SEQ IDNO: 7), CA3-GS (SEQ ID NO: 10), CG3-GS (SEQ ID NO: 13), CD3-GS (SEQ IDNO: 16) CE3-GS (SEQ ID NO: 19) or the like.

In a preferred embodiment, each of the TCR or BCR C region specificprimers is provided in a set compatible with all TCR or BCR subclasses.The specific sequence thereof includes the following: CM1 (SEQ ID NO:5), CA1 (SEQ ID NO: 8), CG1 (SEQ ID NO: 11), CD1 (SEQ ID NO: 14), CE1(SEQ ID NO: 17), CM2 (SEQ ID NO: 6), CA2 (SEQ ID NO: 9), CG2 (SEQ ID NO:12), CD2 (SEQ ID NO: 15), CE2 (SEQ ID NO: 18), CM3-GS (SEQ ID NO: 7),CA3-GS (SEQ ID NO: 10), CG3-GS (SEQ ID NO: 13), CD3-GS (SEQ ID NO: 16),CE3-GS (SEQ ID NO: 19), CA1 (SEQ ID NO: 35), CB1 (SEQ ID NO: 37), CA2(SEQ ID NO: 35), CB2 (SEQ ID NO: 37) and the like.

(Large-Scale Analysis)

In another aspect, the present invention provides a method of performinggene analysis using a sample manufactured by the method of the presentinvention.

Gene analysis can be performed by using any analysis technology. Forexample, it is possible to use a technology of assigning V, D, J, and Csequences of each read sequence by using V, D, J, and C sequencesobtained from a known IMGT (the international ImMunoGeneTics informationsystem, http colon//www dot imgt dot org) database as a referencesequence and utilizing HighV-Quest of the IMGT or a novel software(Repertoire Genesis) developed by the Applicant, which was filedconcurrently and described herein as a preferred example of an analysissystem.

In a preferred embodiment, the gene analysis is the quantitativeanalysis of a repertoire of a variable region of a T cell receptor (TCR)or a B cell receptor (BCR).

Different sequences can be distinguished by sequencing individualamplification molecules. Thus, sequencing has sensitivity to detect aquantitative change in clone proliferation. In summary, one providedembodiment of the present invention provides a method of determining aprofile of a recombinant DNA sequence in a T cell and/or B cell. Thepresent method may comprise the steps of: isolating a sample from asubject; performing one or more rounds of nucleic acid amplification andspatially isolation of individual nucleic acids; and sequencing thenucleic acids.

One aspect provides a method of determining a correlation of one or morerepertoires in a subject or an individual. Another aspect provides amethod of developing an algorithm capable of predicting a correlation ofone or more repertoires in any sample derived from a subject having adisease. Another aspect provides a method of using an algorithm capableof predicting a correlation of one or more repertoires in any samplederived from a subject to find a correlation of one repertoire of anindividual or correlation of a plurality of repertoires. Another aspectprovides a method of creating an algorithm that calculates a diseaseactivity score. Another aspect provides a method of monitoring acondition of a disease of an individual.

(Analysis System)

The present invention provides bioinformatics for performingquantitative analysis of a repertoire of a variable region of a T cellreceptor (TCR) or a B cell receptor (BCR) by using a next generationsequencing technique.

In one aspect, the present invention is a method of analyzing a TCR orBCR repertoire, comprising the following steps: (1) providing areference database for each gene region comprising at least one of a Vregion, a D region, a J region and optionally a C region; (2) providingan input sequence set which is optionally trimmed and optionallyextracted to have a suitable length; (3) searching for homology of theinput sequence set with the reference database for the each gene regionand recording an alignment with an approximate reference allele and/or asequence of the reference allele; (4) assigning the V region and the Jregion for the input sequence set and extracting a nucleic acid sequenceof the D region based on a result of assigning (preferably, assigningthe V region and the J region for the input sequence set and extractinga CDR3 sequence, with the front of CDR3 on a reference V region and endof CDR3 on reference J as guides); (5) translating the nucleic acidsequence of the D region into an amino acid sequence and classifying theD region by utilizing the amino acid sequence (preferably translatingthe nucleic acid sequence of the CDR3 into an amino acid sequence andclassifying the D region by utilizing the amino acid sequence); and (6)calculating a frequency of appearance for each of the V region, the Dregion, and the J region and optionally the C region or a frequency ofappearance of a combination thereof based on the classifying in (5) toderive the TCR or BCR repertoire.

Each step of the present invention is explained with specific operationof each system or apparatus while referring to the flow chart in FIG.43.

FIG. 43 is a flow chart showing a processing flow demonstrating a methodof analyzing a TCR or BCR repertoire in the gene analysis system of thepresent invention. Further, each of the symbols S1-S6 in the Figurecorresponds to each of steps (1)-(6) in the following explanation.

In the method of the present invention, (1) providing a referencedatabase for each gene region comprising at least one of a V region, a Dregion, a J region and optionally a C region can be accomplished, forexample for the V region, by appropriately selecting and providing adatabase comprising and providing information on the V region.

In the method of the present invention, (2) providing an input sequenceset which is optionally trimmed and optionally extracted to have asuitable length is accomplished by optional trimming with a function ofan appropriate software or the like and optionally providing anextracted input sequence set after appropriately selecting a length. Aninput sequence may be, for example, a set of amplicons amplified by aknown method or a set of amplicons amplified by PCR with an unbiasedmethod as described in the application submitted concurrently with thepresent application.

In the method of the present invention, (3) searching for homology ofthe input sequence set with the reference database for the each generegion and recording an alignment with an approximate reference alleleand/or a sequence of the reference allele is performed by appropriatelyusing a software for performing homology search, for each gene range(for example, the V region and the like), performing homology searchwith a reference database on the input sequence set, and recordingalignment with an approximate reference allele and/or a sequence of thereference allele obtained as a result. In FIGS. 29 and 30, the box of“BLAST” or “BLAST analysis”, the IMGT database therebelow, and thevertical double line connecting the two correspond thereto.

In the method of the present invention, (4) assigning the V region andthe J region for the input sequence set and extracting a nucleic acidsequence of the D region based on a result of assigning can beaccomplished by determining a V region and/or J region based on knowninformation from a sequence alignment. Such extraction can beaccomplished by assigning the V region and the J region for the inputsequence set and extracting a CDR3 sequence, with the front of CDR3 on areference V region and end of CDR3 on reference J as guides. In FIGS. 29and 30, defining both ends of a region as in the horizontal arrow underDno, with the horizontal arrow under V and the horizontal arrow under Jas guides, corresponds to extraction of a CDR3 sequence.

In the method of the present invention, (5) translating the nucleic acidsequence of the D region into an amino acid sequence and classifying theD region by utilizing the amino acid sequence can be accomplished bytranslating into an amino acid using a known method in the art andpicking out a sequence corresponding to the D region by homology searchor the like on the amino acid sequence. Preferably, the nucleic acidsequence of the CDR3 can be translated into an amino acid sequence andthe D region is classified by utilizing the amino acid sequence.

In the method of the present invention, (6) calculating a frequency ofappearance for each of the V region, the D region, and the J region andoptionally the C region or a frequency of appearance of a combinationthereof based on the classifying in (5) to derive the TCR or BCRrepertoire can be accomplished by calculating a frequency of appearanceof the V region, D region, J region and/or the C region calculated inthe above-described step, for example, by organizing the frequenciesinto a list. A TCR or BCR repertoire can be derived thereby.

The following steps are further explained while referring to FIG. 42.

In S1 (step (1)), a reference database is provided. This may be storedin an external storage apparatus 1405, but can generally be obtained asa publically disclosed database through a communication device 1411.Alternatively, an input apparatus 1409 may be used to input andoptionally record a database in a RAM 1403 or the external storageapparatus 1405. In this regard, a database comprising a region ofinterest such as a V region is provided.

In S2 (step (2)), an input sequence set is provided. For example, a setof sequence information obtained from a set of amplicons amplified in aPCR amplification reaction is inputted by using the input apparatus 1409or through the communication device 1411. In this regard, an apparatusthat receives an amplicon of a PCR amplification reaction and performsgenetic sequence analysis thereon may be connected. Such a connection ismade through a system bus 1420 or through the communication device 1411.Trimming and/or extraction of a suitable length can be optionallyperformed at this stage. Such processing is performed with a CPU 1401. Aprogram for trimming and/or extraction may be provided via each of theexternal storage apparatus, communication device, or input apparatus.

In S3 (step (3)), alignment is performed. At this stage, homology searchis performed on the input sequence set with the reference database foreach of the gene regions. For the homology search, the referencedatabase obtained via the communication device 1411 or the like isprocessed with a homology search program. The CPU 1401 performs theprocessing. Further, results obtained as a result thereof are analyzedfor alignment with an approximate reference allele and/or a sequence thereference allele. This is also processed by the CPU 1401. A program forthe execution thereof may be provided via each of the external storageapparatus, communication device, or input apparatus.

In S4 (step (4)), nucleic acid sequence information on D is detected.This is also processed by the CPU 1401. A program for the executionthereof may be provided via each of the external storage apparatus,communication device, or input apparatus. This step assigns a V regionand a J region for the input sequence set. Assignment is also processedby the CPU 1401. Further, the CPU 1401 also extracts a nucleic acidsequence of the D region based on a result of assigning. A program forthe assigning and extracting process may also be provided via each ofthe external storage apparatus, communication device, or inputapparatus. Preferably, this step can be accomplished by determining a Vregion and/or J region based on known information from sequencealignment. Results can be stored in the RAM 1403 or external storageapparatus 1405.

Preferably, such extraction can be accomplished by assigning the Vregion and the J region for the input sequence set and extracting a CDR3sequence, with the front of CDR3 on a reference V region and end of CDR3on reference J as guides. Such processing can also be performed by theCPU 1401. A program therefor may also be provided via each of theexternal storage apparatus, communication device, or input apparatus.

In S5 (step (5)), a D region is classified. A nucleic acid sequence ofthe D region is translated into an amino acid sequence and the D regionis classified by utilizing the amino acid sequence. This is alsoprocessed by the CPU 1401. A program for this processing may also beprovided via each of the external storage apparatus, communicationdevice, or input apparatus. A sequence corresponding to the D region maybe picked up by homology search or the like on the obtained amino acidsequence. This is also processed by the CPU 1401. A program for thisprocessing may also be provided via each of the external storageapparatus, communication device, or input apparatus. Preferably, anucleic acid sequence of the CDR3 can be translated into an amino acidsequence to classify the D region by utilizing the amino acid sequence.This is also processed by the CPU 1401. A program for this processingmay also be provided via each of the external storage apparatus,communication device, or input apparatus.

In S6 (step (6)), a frequency of appearance for each of the V region,the D region, and the J region and optionally the C region or afrequency of appearance of a combination thereof is calculated based onthe above-described classifying to derive a TCR or BCR repertoire. Thecalculating and deriving are also processed by the CPU 1401. A programfor this processing may also be provided via each of the externalstorage apparatus, communication device, or input apparatus.

In one preferred embodiment, the gene region used in the presentinvention comprises all of the V region, the D region, the J region andoptionally the C region.

In one embodiment, the reference database is a database with a unique IDassigned to each sequence. A sequence of a gene can be analyzed based ona simple indicator, i.e., ID, by uniquely assigning an ID.

In one embodiment, the input sequence set is an unbiased sequence set.An unbiased sequence set can be implemented by PCR amplification with anunbiased method as described herein. When precision is not required foran unbiased method, “pseudo-unbiased method” with a relatively lowquality such as the Smart method may be used. Thus, “unbiased” as usedherein refers to unbiasedness with precision as accomplished by themethod of the present invention. When such a level is not attained, amethod is referred to as a “pseudo-unbiased method”. When an unbiasedmethod as described herein is to be particularly distinguished, the term“precisely unbiased” may be used. However, it is understood that evenwithout a specific description of “precisely”, unbiasedness herein is ata level accomplished by using the method as described herein.

In another embodiment, the sequence set is trimmed. An unnecessary orunsuitable nucleic acid sequence can be removed by trimming, such thatefficiently of analysis can be enhanced.

In a preferred embodiment, trimming is accomplished by the steps of:deleting low quality regions from both ends of a read; deleting a regionmatching 10 bp or more with an adaptor sequence from the both ends ofthe read; and using the read as a high quality read in analysis when aremaining length is 200 bp or more (TCR) or 300 bp or more (BCR).Preferably, the low quality refers to a 7 bp moving average of QV valueless than 30.

In a preferred embodiment, the approximate sequence is the closestsequence. In a specific embodiment, the approximate sequence isdetermined by a ranking of 1. number of matching bases, 2. kernellength, 3. score, and 4. alignment length.

In another embodiment, the homology search is conducted under acondition tolerating random mutations to be scattered throughout. Such acondition is often expressed by the following condition for BLAST/FASTAoptimal parameters: tolerates a maximum mismatch of 33% across the fulllength of an alignment; and tolerates a maximum nonconsecutive mismatchof 60% for any 30 bp therein. In one embodiment, the homology searchcomprises at least one condition from (1) shortening of a window size,(2) reduction in a mismatch penalty, (3) reduction in a gap penalty, and(4) a top priority ranking of an indicator is a number of matchingbases, compared to a default condition.

In another embodiment, the homology search is carried out under thefollowing conditions in BLAST or FASTA:

V mismatch penalty=−1, shortest alignment length=30, and shortest kernellength=15;

D word length=7 (for BLAST) or K-tup=3 (for FASTA), mismatch penalty=−1,gap penalty=0, shortest alignment length=11, and shortest kernellength=8;

J mismatch penalty=−1, shortest hit length=18, and shortest kernellength=10; and

C shortest hit length=30 and shortest kernel length=15.

This condition can also be used, for example, as long as it is asituation where a shorter (about 200 bp) sequence is used to classifyonly part of a region (situation that does not fall under the “preferredexample”). In addition, this condition can also be used in a situationwhere an Illumina sequencer is used. In this case, possibility of usingbwa or bowtie for homology search is considered.

In a specific embodiment, the D region is classified by a frequency ofappearance of the amino acid sequence.

In a further embodiment, a combination of a result of search forhomology with the nucleic acid sequence of CDR3 and a result of aminoacid sequence translation is used as a classification result when thereis a reference database for the D region in the step (5).

In another embodiment, only the frequency of appearance of the aminoacid sequence is used for classification when there is no referencedatabase for the D region in the step (5).

In a specific embodiment, the frequency of appearance is counted in aunit of a gene name and/or a unit of an allele.

In another embodiment, the step (4) comprises the step of assigning theV region and the J region for the input sequence set and extracting aCDR3 sequence, with the front of CDR3 on a reference V region and end ofCDR3 on reference J as guides.

In a further embodiment, the step (5) comprises translating the nucleicacid sequence of the CDR3 into an amino acid sequence and classifying aD region by using the amino acid sequence.

In one aspect, the present invention provides a system for analyzing aTCR or BCR repertoire, wherein the system comprises: (1) means forproviding a reference database for each gene region comprising at leastone of a V region, a D region, a J region and optionally a C region; (2)means for providing an input sequence set which is optionally trimmedand optionally extracted to have a suitable length; (3) means forsearching for homology of the input sequence set with the referencedatabase for the each gene region and recording an alignment with anapproximate reference allele and/or a sequence of the reference allele;(4) means for assigning the V region and the J region for the inputsequence set and extracting a nucleic acid sequence of the D regionbased on a result of assigning; (5) means for translating the nucleicacid sequence of the D region into an amino acid sequence andclassifying the D region by utilizing the amino acid sequence; and (6)means for calculating a frequency of appearance for each of the Vregion, the D region, and the J region and optionally the C region or afrequency of appearance of a combination thereof based on theclassifying in (5) to derive the TCR or BCR repertoire.

In another aspect, the present invention provides a computer program forhaving a computer execute processing of a method of analyzing a TCR orBCR repertoire, the method comprising the following steps: (1) providinga reference database for each gene region comprising at least one of a Vregion, a D region, a J region and optionally a C region; (2) providingan input sequence set which is optionally trimmed and optionallyextracted to have a suitable length; (3) searching for homology of theinput sequence set with the reference database for the each gene regionand recording an alignment with an approximate reference allele and/or asequence of the reference allele; (4) assigning the V region and the Jregion for the input sequence set and extracting a nucleic acid sequenceof the D region based on a result of assigning; (5) translating thenucleic acid sequence of the D region into an amino acid sequence andclassifying the D region by utilizing the amino acid sequence; and (6)calculating a frequency of appearance for each of the V region, the Dregion, and the J region and optionally the C region or a frequency ofappearance of a combination thereof based on the classifying in (5) toderive the TCR or BCR repertoire.

In still another aspect, the present invention provides a recordingmedium for storing a computer program for having a computer executeprocessing of a method of analyzing a TCR or BCR repertoire, the methodcomprising the following steps: (1) providing a reference database foreach gene region comprising at least one of a V region, a D region, a Jregion and optionally a C region; (2) providing an input sequence setwhich is optionally trimmed and optionally extracted to have a suitablelength; (3) searching for homology of the input sequence set with thereference database for the each gene region and recording an alignmentwith an approximate reference allele and/or a sequence of the referenceallele; (4) assigning the V region and the J region for the inputsequence set and extracting a nucleic acid sequence of the D regionbased on a result of assigning; (5) translating the nucleic acidsequence of the D region into an amino acid sequence and classifying theD region by utilizing the amino acid sequence; and (6) calculating afrequency of appearance for each of the V region, the D region, and theJ region and optionally the C region or a frequency of appearance of acombination thereof based on the classifying in (5) to derive the TCR orBCR repertoire.

(System Configuration)

The configuration of a system 1 of the present invention is explainedwhile referring to the functional block diagram in FIG. 42. The Figureshows a case that is materialized with a single system.

The gene analysis system 1 of the present invention is configured byconnecting a RAM 1403, external storage apparatus 1405 such as ROM, HDD,magnetic disk, or flash memory such as USB memory and an input outputinterface (I/F) 1425 via a system bus 1420 to the CPU 1401 installed ina computer system. An input apparatus 1409 such as a keyboard or amouse, an output apparatus 1407 such as a display, and a communicationdevice 1411 such as a modem are each connected to the input output I/F1425. The external storage apparatus 1405 comprises an informationdatabase storing section 1430 and a program storing section 1440, whichare both constant storage regions reserved in the external storageapparatus 1405.

Such a hardware configuration is designed to achieve a function of thepresent invention in cooperation with an OS (operating system) by theCPU 1401 calling out, deploying, and executing a software programinstalled on the storage apparatus 1405 on the RAM 1403 from havingvarious instructions (commands) being input via the input apparatus 1409or from receiving a command via the communication I/F, communicationdevice 1411 or the like.

A reference database, input sequence set, created classification data,data of a TCR or BCR repertoire or the like, or information obtained viathe communication device 1411 or the like is constantly written andupdated into the database storage section 1430. Information on eachsequence in each input sequence set and information such as informationID of each gene in a reference database are managed with each mastertable to allow information from a sample that is subjected toaccumulation to be managed by IDs defined in each master table.

As input sequence set entry information, a sample provider ID, sampleinformation, result of nucleic acid analysis, knownindividual/physiological information and result of TCR or BCR repertoireanalysis are associated with an ID and stored in the database storagesection 1430. In this regard, the result of TCR or BCR repertoireanalysis is information obtained via the processing of the nucleic acidanalysis result by the processing of the present invention.

Further, a computer program stored in the program storing section 1440configures a computer as a system for implementing the above-describedprocessing system, e.g., a system for implementing processing such astrimming, extraction, alignment, assignment, classification, ortranslation. Each of the features is an independent computer program,module or routine thereof or the like, which is executed by theabove-described CPU 1401 to configure a computer as each system orapparatus. Hereinafter, each system is constructed by cooperation ofeach function in each system.

(Repertoire Analysis System/Analysis Method)

In one aspect, the present invention provides a method of quantitativelyanalyzing a repertoire of a variable region of a T cell receptor (TCR)or a B cell receptor (BCR) of a subject by using a database. The methodcomprises: (1) providing a nucleic acid sample comprising a nucleic acidsequence of the T cell receptor (TCR) or the B cell receptor (BCR) whichis amplified from the subject in an unbiased manner; (2) determining thenucleic acid sequence comprised in the nucleic acid sample; and (3)calculating a frequency of appearance of each gene or a combinationthereof based on the determined nucleic acid sequence to derive a TCR orBCR repertoire of the subject. This method and methods comprising one ormore additional features explained herein are called “repertoireanalysis method of the present invention” herein. In addition, a systemmaterializing the repertoire analysis method of the present invention isreferred to as the “repertoire analysis system of the presentinvention”.

The (1) providing a nucleic acid sample comprising a nucleic acidsequence of the T cell receptor (TCR) or the B cell receptor (BCR) whichis amplified from the subject in an unbiased manner in the method of thepresent invention may provide any sample as long as the sample issuitable for determining a nucleic acid sequence. As such a technology,it is possible to use the above-described preferred amplificationmethods of the present invention as well as Reverse transcriptase-PCR,real-time PCR, digital PCR, emulsion PCR, amplified fragment lengthpolymorphism (AFLP) PCR, allele specific PCR, assembly PCR, asymmetricalPCR, colony PCR, helicase-dependent amplification, hot start PCR,inverse PCR, in situ PCR, nested PCR, Touchdown PCR, loop-mediatedisothermal PCR (LAMP), Nucleic acid sequence based amplification(NASBA), Ligase Chain Reaction, Branch DNA Amplification, Rolling CircleAmplification, Circle to circle Amplification, SPIA amplification,Target Amplification by Capture and Ligation (TACL), 5′-Rapidamplification of cDNA end (5′-RACE), 3′-Rapid amplification of cDNA end(3′-RACE), Switching Mechanism at 5′-end of the RNA Transcript (SMART).

The (2) determining the nucleic acid sequence comprised in the nucleicacid sample in the method of the present invention may use any method,as long as a nucleic acid sequence can be determined. Generally, a largequantity of sequencing is required. Thus, it is preferable to use anautomated large-scale sequencing method. Examples of such a sequencingmethod include sequencing using a Roche 454 sequencer (GS FLX+, GSJunior), sequencing using the technology of an Ion Torrent sequencer(Ion PGM™ Sequencer), and sequencing using the technology of Illumina(GenomeAnalyzer IIx, Hiseq, Miseq). Other sequencing methods includeHeliscope™ Sequencer, Helicos True Single Molecule Sequencing (tSMA)(Harris. T. D. et. al Science 2008, 320-160-109), SoliD™ Sequencing(LifeTechnologies, Inc.), Single Molecule Real Time (SMRT™) PacBio system(Pacific Biosciences, CA), Nanopore Sequencing (Oxford NanoporeTechnologies, UK), LaserGen™ (LaserGen, Inc. CA) (reference: Litosh V Aet al., Nucleic Acids Res. 2011 March; 39(6): e39), Lightspeed Genomics™(Lightspeed Genomics, CA), GnuBIO (GnuBIO Inc., MA), Polonatorsequencing (M. Danaher/Dover, Azco Biotec. Inc., CA), MebiousBiosystem's single molecule sequencing (Mebious Biosystems Limited),Millikan sequencing (Caerus Molecular Diagnostics, Inc), IntelligentBio-Systems, Inc. (reference: Hutter D, et al Nucleosides NucleotidesNucleic Acid 2010; 29(11):879-95.), Hybridization-Assisted NanoporeSequencing (Nabsys Inc., RI), Nanopore sequencing (Noblegen Biosciences,Inc.), Nanopore sequencing (Electronic Biosciences, CA),Thermosequencing (GENIUS™ technology) (Genapsys, Inc., CA), CAERUSMOLECULAR DIAGNOTICS, INC, CA, Individual Molecule Placement RapidNanotransfer (IMPRNT) (Halcyon Molecular, Inc), Monochromaticaberration-corrected dual-beam low energy electron microscopy (ElectronOptica, Inc., CA), ZS Genesis DNA Sequencing (ZS Genetyics, Inc) and thelike. A Roche 454 sequencer creates a single stranded DNA bound to twotypes of adaptors which specifically bind to the 3′ terminal and the 5′terminal. The single stranded DNA is bound to a bead via an adaptor andwrapped in a water-in-oil emulsion to form a microreactor having a beadand a DNA fragment. A gene of interest is then amplified by emulsion PCRin the water-in-oil emulsion. The bead is applied to a picotiter plateand sequenced. ATP is generated by sulfrylase, with pyrophosphoric acidgenerated when dNTP is taken up into a DNA by DNA polymerase as asubstrate (Pyrosequencing). With the ATP and Luciferin as the substrate,luciferase emits fluorescence, which is detected with a CCD camera todetermine a base sequence. For the technology of Ion Torrent, emulsionPCR is performed by the same method as Roche, and then a bead istransferred to a microchip, where a sequencing reaction is performed.For detection, the hydrogen ion concentration released when a DNA isextended by polymerase is detected on a semiconductor chip and convertedto a base sequence. The sequencing of Illumia is a method of sequencingwhile amplifying and synthesizing a DNA of interest on a flow cell bythe technology of bridge PCR and sequencing-by-synthesis. Bridge PCRcreates a single stranded DNA, to which different adaptor sequences areadded to both ends. An adaptor sequence is immobilized on the 5′terminal side in advance on a flow cell, where it is immobilized on tothe flow cell by an extension reaction. Similarly, an adaptor isimmobilized on the 3′ terminal side at an adjacent position and binds tothe 3′ terminal of a synthesized DNA to synthesize a double stranded DNAwhile forming a so-called bridge. Bridge binding→extension→denaturationare then repeated such that numerous single stranded DNA fragments arelocally amplified to form an accumulated cluster. With such a singlestranded DNA as a template, sequencing is performed. Forsequencing-by-synthesis, after a sequencing primer is added, a singlebase synthesizing reaction is performed with 3′ terminal blockfluorescent dNTP using DNA polymerase. A fluorescent substance bound toa base is excited by a laser light, and light emission is recorded as apicture by a fluorescence microscope. The base sequence is thendetermined by proceeding with a step of removing the fluorescentsubstance and the block to perform the next extension reaction anddetecting fluorescence. Preferably, it is advantageous to sequence aplurality of sequence by a single sequencing. It is also advantageous inthat a longer sequence length can be sequenced at once.

For the (3) calculating a frequency of appearance of each gene or acombination thereof based on the determined nucleic acid sequence toderive a TCR or BCR repertoire of the subject in the present invention,any technique can be used as long as a frequency of appearance of genesand a combination thereof can be calculated and a TCR repertoire and/orBCR repertoire can be derived. For example, the analysis toolHighV-Quest provided by the IMGT can be used, in addition to thepreferred examples of the aforementioned analysis methods. It is alsopossible to use other technologies by using a software implemented withan alignment feature or a mapping feature, i.e., AbMapper, ALLPATHS,Arachne, BACCardl, Bfast, BLAT, Bowtie, BWA-MEM, BWA-SW, BWA, CCRa VAT &QuTie, CLC workstation, CNV-seq, Elvira, ERNE-map (rNA), GSMapper,Glimmer, gnumap, Goseq, ICAtools, LOCAS, MapSplice, Maq, MEMS, Mosaik,NGSView, Novoalign, OSLay, Partek, Perm, Projector, Qpalma, RazerS,SHARCGS, SHRiMP2, SNP-o-matic, Splicemap, SSAHA2, Stampy, Tablet, TMAP,Tophat, or Velve.

In one embodiment, the nucleic acid sample comprises nucleic acidsequences of a plurality of types of T cell receptors (TCR) or B cellreceptors (BCR) and the step (2) for determining a sequence determinesthe nucleic acid sequence by a single sequencing. The method of thepresent invention can reduce or eliminate bias that can occur bydetermining a plurality of types of sequences by a single sequencing.Thus, the present invention is useful especially in accurately detectinga TCR or BCR read that occurs at a low frequency.

In another embodiment, the single sequencing is characterized in that atleast one of the sequences used as a primer in amplification from thenucleic acid sample into a sample for sequencing has the same sequenceas a nucleic acid sequence encoding a C region or a complementary strandthereof. Any TCR or BCR can be amplified in the same manner to achieveunbiasedness by using a primer having the same sequence as a nucleicacid sequence encoding a C region or a complementary strand thereof.

In another embodiment, the single sequencing is characterized in beingperformed with a common adaptor primer. In a preferred embodiment, thecommon adaptor primer is designed such that the primer has a base lengthsuitable for amplification, is unlikely to have homodimer andintramolecular hairpin structures, and is able to stably form a doublestrand, and designed not to be highly homologous with all TCR geneticsequences in the database and/or to have the same level of meltingtemperature (Tm) as the C region specific primer. More preferably, thecommon adaptor primer designed not to have homodimer and intramolecularhairpin structures and to have homology with other genes comprising aBCR or TCR is selected. In a specific embodiment, the common adaptorprimer is P20EA (SEQ ID NO: 2) and/or P10EA (SEQ ID NO: 3).

In one embodiment, the unbiased amplification includes being non-Vregion specific amplification. Bias can be further reduced or eliminatedcompared to a case of performing unbiased amplification by devising amultiplex or the like using a V specific primer.

In one embodiment, a repertoire targeted by the present invention is therepertoire of a variable region of a BCR, and the nucleic acid sequenceis a BCR nucleic acid sequence. BCRs are considered to be prone tohaving a mutation, especially in a V region. Thus, accurate analysis ofa BCR repertoire is difficult with a technology using V region specificamplification.

In one aspect, the present invention provides a method of analyzing adisease, disorder or condition of the subject based on the TCR or BCRrepertoire derived based on the repertoire analysis method of thepresent invention.

In the method of analyzing a disease, disorder or condition of thepresent invention, the technology of analyzing a disease, disorder orcondition of a subject based on a TCR or BCR repertoire derived based onthe repertoire analysis method of the present invention starts fromlinking derived read data consisting of read types, number of reads,read frequency, V region, J region, C region, CDR3 sequence or the likewith clinical information such as disease, disorder, or condition toform a database by using a spreadsheet such as EXCEL. First, for aderived individual read sequence: 1. a TCR having a known function suchas NKT or MAIT is search; 2. existing public database is searched forcollation with a TCR or BCR with a known function such as antigenspecificity; 3. the constructed database or an existing public databaseis searched to associate a common sample origin, property or functionwith a disease, disorder or condition. Next, for a read sequence in asample: 1. it is clarified whether a specific read frequency increases(clonality increases); 2. examination is carried out to find out whethera specific V chain or J chain usage frequency increases or decreasesdepending on the onset of a disease or condition of a disorder; 3.examination is carried out to find out whether the length of a CDR3sequence in a specific V chain increases or decreases depending on theonset of a disease or condition of a disorder; 4. the composition orsequence of a CDR3 region that changes depending on the onset of adisease or condition of a disorder is examined. 5. a read that appearsor disappears depending on the onset of a disease or condition of adisorder is searched; 6. a read that increases or decreases depending onthe onset of a disease or condition of a disorder is searched; 7. a readthat appears or increase/decrease depending on the onset of a disease orcondition of a disorder is searched in another sample and associatedwith a disease, disorder or condition; 8. a diversity index or similarlyindex is calculated with a statistical analysis software such asESTIMATES or R (vegan) by using data such as number of samples, readtype, or the number of reads; and 9. a change in the diversity index orsimilarity index can be associated with the onset of a disease orcondition of a disorder.

In one embodiment, the disease, disorder or condition of the subject inthe analysis method of the present invention includes, but is notlimited to, hematological tumor, colorectal cancer, immune status,rheumatoid arthritis, adult T-cell leukemia, T-cell large granularlymphocyte leukemia, idiopathic thrombocytopenic purpura, and the like.

In another embodiment, the present invention provides a method oftreating or preventing the disease, disorder or condition of the subjectdetermined by the method of the present invention, comprising:quantitatively associating the disease, disorder or condition of thesubject with the TCR or BCR repertoire; and selecting means for suitabletreatment or prevention from the quantitative association.

In one embodiment, diseases, disorders or conditions of a subjecttargeted in the method of treating or preventing in the presentinvention include, but are not limited to, hematological tumor,colorectal cancer, immune status, rheumatoid arthritis, adult T-cellleukemia, T-cell large granular lymphocyte leukemia, idiopathicthrombocytopenic purpura, and the like.

In another aspect, the present invention provides a system (analysissystem) for quantitatively analyzing a repertoire of a variable regionof a T cell receptor (TCR) or a B cell receptor (BCR) of a subject byusing a database. The system comprises (1) a kit for providing a nucleicacid sample comprising a nucleic acid sequence of the T cell receptor(TCR) or the B cell receptor (BCR) which is amplified from the subjectin an unbiased manner; (2) an apparatus for determining the nucleic acidsequence comprised in the nucleic acid sample; and (3) an apparatus forcalculating a frequency of appearance of each gene or a combinationthereof based on the determined nucleic acid sequence to derive a TCR orBCR repertoire of the subject. Such a system and systems comprising oneor more additional features explained herein are referred to as“repertoire analysis system of the present invention”. The repertoireanalysis system of the present invention materializes the “repertoireanalysis method of the present invention”.

In another embodiment, the nucleic acid sample comprises nucleic acidsequences of a plurality of types of T cell receptors (TCR) or B cellreceptors (BCR) and the apparatus of (2) is configured to be able todetermine the nucleic acid sequences by a single sequencing.

In another embodiment, the single sequencing is characterized in that atleast one of the sequences used as a primer in amplification from thenucleic acid sample to a sample for sequencing has the same sequence asa C region. The method of the present invention can reduce or eliminatebias that can occur by determining a plurality of types of sequence by asingle sequencing. Thus, the present invention is useful especially inaccurately detecting a TCR or BCR read that occurs at a low frequency.

In another embodiment, the single sequencing is characterized in that atleast one of the sequences used as a primer in amplification from thenucleic acid sample to a sample for sequencing has the same sequence asa nucleic acid sequence encoding a C region or a complementary strandthereof. Such a primer may be furnished in the apparatus, comprised in akit, or provided separately. Any TCR or BCR can be amplified in the samemanner to achieve unbiasedness by using a primer having the samesequence as a nucleic acid sequence encoding a C region or acomplementary strand thereof.

In another embodiment, the single sequencing is characterized in beingperformed with a common adaptor primer. Such a common adaptor primer maybe furnished with the apparatus, comprised in a kit or providedseparately. In a preferred embodiment, the common adaptor primer isdesigned such that the primer has a base length suitable foramplification, is unlikely to have homodimer and intramolecular hairpinstructures, and is able to stably form a double strand, and designed notto be highly homologous with all TCR genetic sequences in the databaseand/or to have the same level of melting temperature (Tm) as the Cregion specific primer. Further preferably, the common adaptor primerdesigned not to have homodimer and intramolecular hairpin structures andto have homology with other genes comprising a BCR or TCR is selected.In a specific embodiment, the common adaptor primer is P20EA (SEQ ID NO:2) and/or P10EA (SEQ ID NO: 3).

In one embodiment, a nucleic acid sequence comprised in a nucleic acidsample provided by the kit of the present invention is unbiasedlyamplified, where the amplification is not V region specificamplification. Bias can be further reduced or eliminated compared to acase of performing unbiased amplification by devising a multiplex or thelike using a V specific primer.

In one embodiment, the repertoire subjected to analysis of the system ofthe present invention is the repertoire of a variable region of a BCR,and the nucleic acid sequence is a BCR nucleic acid sequence. BCRs areconsidered to be prone to having a mutation, especially in a V region.Thus, accurate analysis of a BCR repertoire is difficult with atechnology using V region specific amplification. Use of the system ofthe present invention allows accurate analysis of a BCR repertoire.

In another aspect, the present invention provides a system (analysissystem) of analyzing a disease, disorder or condition of the subject,comprising the analysis system of the present invention and means foranalyzing the disease, disorder or condition of the subject based on theTCR or BCR repertoire derived based the system. The means of analyzing adisease, disorder or condition of a subject based on a TCR or BCRrepertoire derived based on the system of the analysis system of thepresent invention starts from linking derived read data consisting ofread types, number of reads, read frequency, V region, J region, Cregion CDR3 sequence or the like with clinical information such asdisease, disorder, or condition to form a database by using aspreadsheet such as EXCEL. First, for a derived individual readsequence: 1. a TCR having a known function such as NKT or MAIT issearched; 2. existing public database is searched for collation with aTCR or BCR with a known function such as antigen specificity; 3. theconstructed database or an existing public database is searched toassociate a common sample origin, property or function with a disease,disorder or condition. Next, for a read sequence in a sample: 1. it isclarified whether a specific read frequency increases (clonalityincreases); 2. examination is carried out to find out whether a specificV chain or J chain usage frequency increases or decreases depending onthe onset of a disease or condition of a disorder; 3. examination iscarried out to find out whether the length of a CDR3 sequence in aspecific V chain increases or decreases depending on the onset of adisease or condition of a disorder; 4. the composition or sequence of aCDR3 region that changes depending on the onset of a disease orcondition of a disorder is examined. 5. a read that appears ordisappears depending on the onset of a disease or condition of adisorder is searched; 6. a read that increases or decreases depending onthe onset of a disease or condition of a disorder is searched; 7. a readthat appears or increase/decrease depending on the onset of a disease orcondition of a disorder is searched in another sample and associatedwith a disease, disorder or condition; 8. a diversity index or similarlyindex is calculated with a statistical analysis software such asESTIMATES or R (vegan) by using data such as number of samples, readtype, or number of reads; and 9. a change in the diversity index orsimilarity index can be associated with the onset of a disease orcondition of a disorder.

In one embodiment, the disease, disorder or condition of the subjectthat can be analyzed by the analysis system of the present inventionincludes, but is not limited to, hematological tumor, colorectal cancer,immune status, rheumatoid arthritis, adult T-cell leukemia, T-cell largegranular lymphocyte leukemia, idiopathic thrombocytopenic purpura, andthe like.

In another aspect, the present invention provides a system (treatmentsystem or prevention system) of treating or preventing the disease,disorder or condition of the subject determined by the analysis systemof the present invention, comprising: means for quantitativelyassociating the disease, disorder or condition of the subject with theTCR or BCR repertoire; and means for selecting means for suitabletreatment or prevention from the quantitative association.

The means for quantitatively associating the disease, disorder orcondition of the subject with the TCR or BCR repertoire in the system ofthe present invention can be materialized by the following configurationor the like. That is, this can be materialized by reading outinformation of a repertoire derived out by the analysis system of thepresent invention and reading out information related to a disease,disorder or condition of a subject and associating them. From theaggregated read data that is derived out, a V region, a J region, and aC region is assigned from collation with an existing reference sequenceand a CDR3 sequence is determined. Matching reads are added up based onthe V region, J region and CDR3 sequence. For each unique read (readwith no other same sequence), the number of reads detected and the ratiowith respect to the total number of reads (frequency) in a sample arecalculated. The information (read sequence, number of reads, readfrequency, V region, J region, C region, or CDR3 sequence) is linkedwith the clinical information of a subject (medical history, diseasename, disease type, degree of progression, severity, HLA type, immunestatus or the like) to form a database by using a spreadsheet such asEXCEL or a software having a database formation feature. Read sequencesin a sample are sorted by the number of reads or frequency and ranked.Further, the number of reads is added up by each V region or J region tocalculate usage frequency of a V region or usage frequency of a Jregion. Base on such information: 1. it is clarified whether a specificread frequency increases (clonality increases); 2. examination iscarried out to find out whether a specific V chain or J chain usagefrequency increases or decreases depending on the onset of a disease orcondition of a disorder; 3. examination is carried out to find outwhether the length of a CDR3 sequence in a specific V chain increases ordecreases depending on the onset of a disease or condition of adisorder; 4. the composition or sequence of a CDR3 region that changesdepending on the onset of a disease or condition of a disorder isexamined. 5. a read that appears or disappears depending on the onset ofa disease or condition of a disorder is searched; 6. a read thatincreases or decreases depending on the onset of a disease or conditionof a disorder is searched; 7. a read that appears or increase/decreasedepending on the onset of a disease or condition of a disorder issearched in another sample and associated with a disease, disorder orcondition; 8. a diversity index or similarly index is calculated with astatistical analysis software such as ESTIMATES or R (vegan) by usingdata such as number of samples, read type, or number of reads; and 9. achange in the diversity index or similarity index can be associated withthe onset of a disease or condition of a disorder. Means for selectingmeans for suitable treatment or prevention from the quantitativeassociation can have the following configuration or the like.Specifically, selection can be materialized for this selection means byassociating quantitatively presented data with the information from thepast or currently available information related to treatment, therapy orprevention to materialize selection of means that improves prognosis.

In one embodiment, the disease, disorder or condition of the subjectincludes, but is not limited to, hematological tumor, colorectal cancer,immune status, rheumatoid arthritis, adult T-cell leukemia, T-cell largegranular lymphocyte leukemia, idiopathic thrombocytopenic purpura, andthe like.

(Useful Cell, Peptide, and the Like)

In one aspect, the present invention provides a monoclonal T cellrelated to T cell large granular lymphocytic leukemia (T-LGL) expressingTCRα comprising TRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO: 1450) or anucleic acid encoding the same and/or TCR comprisingTRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500) or a nucleic acidencoding the same.

As shown in the Examples or the like, this specific T cell has a varietyof usefulness. For instance, it is demonstrated thatTRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO: 1450) or a nucleic acid encodingthe same in TCRα and/or TRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO:1500) or a nucleic acid encoding the same in TCR can be used as adiagnostic indicator for T cell large granular lymphocytic leukemia(T-LGL). Such a peptide and a nucleic acid encoding the same can bedetected by using any known technology in the art. As used herein,“detecting agent” broadly refers to all agents capable of detecting atarget of interest (e.g., a peptide, nucleic acid, cell or the like).For instance, for such a method, “detection” or “quantification” ofpolynucleotide or polypeptide expression can be accomplished by using asuitable method including, for example, an immunological measuringmethod and measurement of mRNAs, including bond or interaction with amarker detecting agent. Examples of a molecular biological measuringmethod include northern blot, dot blot, PCR and the like. Examples of animmunological measuring method include ELISA using a microtiter plate,RIA, fluorescent antibody method, luminescence immunoassay (LIA),immunoprecipitation (IP), single radial immunodiffusion (SRID),turbidimetric immunoassay (TIA), western blot, immunohistochemicalstaining and the like. Further, a quantification method includes ELISA,RIA and the like. Quantification may also be performed by a geneticanalysis method using an array (e.g., DNA array, protein array). DNAarrays are outlined extensively in (Ed. by Shujunsha, Saibo KogakuBessatsu “DNA Maikuroarei to Saishin PCR ho” [Cellular engineering,Extra issue, “DNA Microarrays and Latest PCR Methods”]). Protein arraysare discussed in detail in Nat Genet. 2002 December; 32 Suppl: 526-32.Examples of a method of analyzing gene expression include, but are notlimited to, RT-PCR, RACE, SSCP, immunoprecipitation, two-hybrid system,in vitro translation and the like, in addition to the methods discussedabove. Such additional analysis methods are described in, for example,Genomu Kaiseki Jikkenho Nakamura Yusuke Labo Manyuaru [Genome analysisexperimental method Yusuke Nakamura Lab Manual], Ed. by Yusuke Nakamura,Yodosha (2002) and the like. The entire descriptions therein isincorporated herein by reference. As used herein, “amount of expression”refers to the amount of polypeptide, mRNA or the like expressed in acell, tissue or the like of interest. Examples of such an amount ofexpression include amount of expression of polypeptide of the presentinvention at a protein level assessed by any suitable method includingan immunological measurement method such as ELISA, RIA, fluorescentantibody method, western blot, and immunohistochemical staining by usingthe antibody of the present invention, and the amount of expression ofthe polypeptide used in the present invention at an mRNA level assessedby any suitable method including molecular biological measuring methodsuch as northern blot, dot blot, and PCR. “Change in the amount ofexpression” refers to an increase or decrease in the amount ofexpression of the polypeptide used in the present invention at a proteinlevel or mRNA level assessed by any suitable method including theabove-described immunological measuring method or molecular biologicalmeasuring method. A variety of detection or diagnosis based on a markercan be performed by measuring the amount of expression of a certainmarker.

The present invention also provides a diagnostic agent for T cell largegranular lymphocytic leukemia (T-cell LGL) comprising a detecting agentfor TRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO: 1450) or a nucleic acidencoding the same in TCRα and/or a detecting agent forTRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500) or a nucleic acidencoding the same in TCRβ.

As used herein, “decrease” or “suppression” of activity or expressionproduct (e.g., protein, transcript (RNA or the like)) or synonymsthereof refers to: a decrease in the amount, quality or effect of aspecific activity, transcript or protein; or activity that decreases thesame.

As used herein, “increase” or “activation” of activity or expressionproduct (e.g., protein, transcript (RNA or the like)) or synonymsthereof refers to: an increase in the amount, quality or effect of aspecific activity, transcript or protein; or activity that increases thesame.

Thus, it is understood that various agents with activity can be detectedor screened by using a regulatory ability such as decrease, suppression,increase or activation of the marker of the present invention as anindicator.

As used herein, “agent” is used broadly and may be any substance orother elements (e.g., energy, radiation, heat, electricity and otherforms of energy) as long as the intended objective can be achieved.Examples of such a substance include, but are not limited to, protein,polypeptide, oligopeptide, peptide, polynucleotide, oligonucleotide,nucleotide, nucleic acid (including for example DNAs such as cDNA andgenomic DNA, RNAs such as mRNA), polysaccharide, oligosaccharide, lipid,organic small molecule (e.g., hormone, ligand, information transmittingsubstance, organic small molecule, molecule synthesized by combinatorialchemistry, small molecule that can be used as a medicine (e.g., smallmolecule ligand and the like) and a composite molecule thereof. Typicalexamples of an agent specific to a polynucleotide include, but are notlimited to, a polynucleotide having complementarity with a certainsequence homology (e.g., 70% or greater sequence identity) to a sequenceof the polynucleotide, polypeptide such as a transcription factor thatbinds to a promoter region and the like. Typical examples of an agentspecific to a polypeptide include, but are not limited to, an antibodydirected specifically to the polypeptide or a derivative or analogthereof (e.g., single strand antibody), a specific ligand or receptorwhen the polypeptide is a receptor or ligand, a substrate when thepolypeptide is an enzyme and the like.

As used herein, “detecting agent” broadly refers to all agents capableof detecting a target of interest (e.g., normal cells (normal cornealendothelial cells) or transformed cells (e.g., transformed cornealendothelial cells)).

As used herein, “diagnostic agent” broadly refers to all agents capableof diagnosing a condition of interest (e.g., disease or the like).

The detecting agent of the present invention may be a complex orcomposite molecule in which another substance (e.g., label or the like)is bound to a portion enabling detection (e.g., antibody or the like).As used herein, “complex” or “composite molecule” refers to anyconstruct comprising two or more portions. For instance, when oneportion is a polypeptide, the other portion may be a polypeptide orother substances (e.g., sugar, lipid, nucleic acid, other carbohydrateor the like). As used herein, two or more constituent portions of acomplex may be bound by a covalent bond or any other bond (e.g.,hydrogen bond, ionic bond, hydrophobic interaction, Van der Waals forceor the like). When two or more portions are polypeptides, the complexmay be called a chimeric polypeptide. Thus, “complex” as used hereinincludes molecules formed by linking a plurality of types of moleculessuch as a polypeptide, polynucleotide, lipid, sugar, or small molecule.

As used herein, “interaction” refers, for two substances, to applying aforce (e.g., intermolecular force (Van der Waals force), hydrogen bond,hydrophobic interaction, or the like) between one substance and theother substance. Generally, two substances that have interacted are in aconjugated or bound state.

As used herein, the term “bond” refers to a physical or chemicalinteraction between two substances or between combinations thereof. Abond includes an ionic bond, non-ionic bond, hydrogen bond, Van derWaals bond, hydrophobic interaction and the like. A physical interaction(bond) may be direct or indirect. Indirect physical interaction (bond)is mediated by or is due to an effect of another protein or compound. Adirect bond refers to an interaction, which does not occur through ordue to an effect of another protein or compound and does notsubstantially involve another intermediate. The degree of markerexpression of the present invention or the like can be measured bymeasuring a bond or interaction.

Thus, an “agent” (such as detecting agent or the like) that“specifically” interacts (or binds) to a biological agent such as apolynucleotide or a polypeptide as used herein encompasses agents withaffinity to the biological agent such as a polynucleotide or polypeptidethat is typically similar or higher, preferably significantly (e.g.,statistically significantly) higher, than affinity to other unrelatedpolynucleotide or polypeptide (especially those with less than 30%identity). Such affinity can be measured by, for example, hybridizationassay, binding assay or the like.

As used herein, “specific” interaction (or bond) of a first substance oragent with a second substance or agent refers to the first substance oragent interacting with (or binding to) the second substance or agent ata higher level of affinity than to substances or agents other than thesecond substance or agent (especially other substances or agents in asample comprising the second substance or agent). Examples of aninteraction (or bond) specific to a substance or agent include, but arenot limited to, a ligand-receptor reaction, hybridization in a nucleicacid, antigen-antibody reaction in a protein, enzyme-substrate reactionand the like, and when both a nucleic acid and a protein are involved, areaction between a transcription factor and a binding site of thetranscription factor and the like, protein-lipid interaction, nucleicacid-lipid interaction and the like. Thus, when substances or agents areboth nucleic acids, a first substance or agent “specificallyinteracting” with a second substance or agent encompasses the firstsubstance or agent having at least partial complementarity to the secondsubstance or agent. Further, examples of a first substance or agent“specifically” interacting with (or binding to) a second substance oragent when substances or agents are both proteins includes, but are notlimited to, interaction by an antigen-antibody reaction, interaction bya receptor-ligand reaction, enzyme-substrate interaction and the like.When two types of substances or agents include a protein and a nucleicacid, a first substance or agent “specifically” interacting with (orbinding to) a second substance or factor encompasses an interaction (ora bond) between a transcription factor and a binding region of a nucleicacid molecule targeted by the transcription factor.

As used herein, “antibody” broadly encompasses a polyclonal antibody,monoclonal antibody, multispecific antibody, chimeric antibody, andanti-idiotype antibody, and a fragment thereof such as Fv fragment, Fab′fragment, F(ab′)₂ and Fab fragment, and other conjugates or functionalequivalents produced by recombination (e.g., chimeric antibody,humanized antibody, multifunctional antibody, bispecific oroligospecific antibody, single chain antibody, scFv, diabody, andsc(Fv)₂ (single chain (Fv)₂), and scFv-Fc). Furthermore, such anantibody may be covalently bonded or recombinantly fused to an enzyme,such as alkaline phosphatase, horseradish peroxidase, a galactosidase orthe like. The antibodies to various reads used in the present inventionmay be of any origin, type, shape or the like which bind to theirrespective specific read. Specifically, known antibodies such as anon-human animal antibody (e.g., mouse antibody, rat antibody, and camelantibody), human antibody, chimeric antibody and humanized antibody canbe used. The present invention can use a monoclonal or polyclonalantibody, but a monoclonal antibody is preferred. It is preferred that abond of an antibody to a specific read is a specific bond.

As used herein, “antigen” refers to any substrate that can bespecifically bound by an antibody molecule. As used herein, “immunogen”refers to an antigen that can initiate lymphocyte activation which leadsto an antigen specific immune response. As used herein, “epitope” or“antigen determinant” refers to a site in an antigen molecule to whichan antibody or a lymphocyte receptor binds. A method of determining anepitope is well known in the art. Such an epitope can be determined bythose skilled in the art by using a well-known and conventionaltechnique when a primary sequence of an amino acid or a nucleic acid isprovided.

As used herein, “means” refers to anything that can be a tool foraccomplishing an objective (e.g., detection, diagnosis, therapy).

For an antibody used herein, it is understood that an antibody with anyspecificity may be used as long as false positive reactions are reduced.Thus, an antibody used in the present invention may be a polyclonalantibody or a monoclonal antibody.

The detecting agent, diagnostic agent or other medicines of the presentinvention can be in a form of a probe and a primer. The probe and theprimer of the present invention can specifically hybridize to a specificread. As described herein, expression of a specific read is, forexample, an indicator of whether there is a colorectal cancer and isuseful as an indicator for the severity of a disease.

As used herein, “(nucleic acid) primer” refers to a substance requiredfor initiating a reaction of a polymeric compound to be synthesized in apolymer synthesizing enzyme reaction. A synthetic reaction of a nucleicacid molecule can use a nucleic acid molecule (e.g., DNA, RNA or thelike) complementary to a portion of a sequence of a polymeric compoundto be synthesized. A primer can be used herein as a marker detectingmeans.

Examples of a nucleic acid molecule generally used as a primer includethose having a nucleic acid sequence with a length of at least 8contiguous nucleotides, which is complementary to a nucleic acidsequence of a gene of interest (e.g., marker of the present invention).Such a nucleic acid sequence may be a nucleic acid sequence with alength of preferably at least 9 contiguous nucleotides, more preferablyat least 10 contiguous nucleotides, still more preferably at least 11contiguous nucleotides, at least 12 contiguous nucleotides, at least 13contiguous nucleotides, at least 14 contiguous nucleotides, at least 15contiguous nucleotides, at least 16 contiguous nucleotides, at least 17contiguous nucleotides, at least contiguous nucleotides, at least 19contiguous nucleotides, at least 20 contiguous nucleotides, at leastcontiguous nucleotides, at least 30 contiguous nucleotides, at least 40contiguous nucleotides, or at least 50 contiguous nucleotides. A nucleicacid sequence used as a probe comprises a nucleic acid sequence that isat least 70% homologous, more preferably at least 80% homologous, stillmore preferably at least 90% homologous, or at least 95% homologous tothe aforementioned sequence. A sequence suitable as a primer may varydepending on the property of a sequence intended for synthesis(amplification). However, those skilled in the art are capable ofdesigning an appropriately primer in accordance with an intendedsequence. Design of such a primer is well known in the art, which may beperformed manually or by using a computer program (e.g., LASERGENE,PrimerSelect, or DNAStar).

The primers according to the present invention can be used as a primerset consisting of two or more types of the primers.

The primers and primer set according to the present invention can beused as primers and primer set in accordance with a common method in aknown method of detecting a gene of interest by utilizing a nucleic acidamplification method such as PCR, RT-PCR, real-time PCR, in situ PCR, orLAMP.

As used herein, “probe” refers to a substance that can be means forsearch, which is used in a biological experiment such as in vitro and/orin vivo screening. Examples thereof include, but are not limited to, anucleic acid molecule comprising a specific base sequence, a peptidecomprising a specific amino acid sequence, a specific antibody, afragment thereof and the like. As used herein, a probe can be used asmarker detecting means.

A nucleic acid molecule generally used as a probe includes those havinga nucleic acid sequence with a length of at least 8 contiguousnucleotides, which is homologous or complementary to a nucleic acidsequence of a gene of interest. Such a nucleic acid sequence may be anucleic acid sequence with a length of preferably at least 9 contiguousnucleotides, more preferably at least 10 contiguous nucleotides, stillmore preferably at least 11 contiguous nucleotides, at least 12contiguous nucleotides, at least 13 contiguous nucleotides, at least 14contiguous nucleotides, at least 15 contiguous nucleotides, at leastcontiguous nucleotides, at least 25 contiguous nucleotides, at least 30contiguous nucleotides, at least contiguous nucleotides, or at least 50contiguous nucleotides. A nucleic acid sequence used as a probecomprises a nucleic acid sequence that is at least about 70% homologous,more preferably at least about 80% homologous, still more preferably atleast about 90% homologous, or at least about 95% homologous with theaforementioned sequence.

In one embodiment, the detecting agent of the present invention may belabeled. Alternatively, the detecting agent of the present invention maybe bound to a tag.

As used herein, “label” refers to an entity (e.g., substance, energy,electromagnetic wave or the like) for distinguishing a molecule orsubstance of interest from others. Such a method of labeling includes RI(radioisotope) method, fluorescence method, biotin method,chemiluminescent method and the like. When a plurality of markers of thepresent invention or agents or means for capturing the same are labeledby a fluorescence method, labeling is performed with labeling substanceshaving different fluorescent emission maximum wavelengths. It ispreferable that the difference in fluorescent emission maximumwavelengths is 10 nm or greater. When labeling a ligand, any label thatdoes not affect the function can be used. However, Alexa™Fluor isdesirable as a fluorescent substance. Alexa™Fluor is a water-solublefluorescent dye obtained by modifying coumarin, rhodamine, fluorescein,cyanine or the like. This is a series compatible with a wide range offluorescence wavelengths. Relative to other fluorescent dyes for thecorresponding wavelength, Alexa™Fluor is very stable, bright and has alow level of pH sensitivity. Combinations of fluorescent dyes withfluorescence maximum wavelength of 10 nm or greater include acombination of Alexa™555 and Alexa™633, combination of Alexa™488 andAlexa™555 and the like. When a nucleic acid is labeled, any substancecan be used that can bind to a base portion thereof. However, it ispreferable to use a cyanine dye (e.g., Cy3, Cy5 or the like of theCyDye™ series), rhodamine 6G reagent, N-acetoxy-N2-acetylaminofluorene(AAF), AAIF (iodine derivative of AAF) or the like. Examples of afluorescent substance with a difference in fluorescent emission maximumwavelengths of nm or greater include a combination of Cy5 and arhodamine 6G reagent, a combination of Cy3 and fluorescein, acombination of a rhodamine 6G reagent and fluorescein and the like. Thepresent invention can utilize such a label to alter a subject ofinterest to be detectable by the detecting means to be used. Suchalteration is known in the art. Those skilled in the art canappropriately carry out such a method in accordance with the label andsubject of interest.

As used herein, “tag” refers to a substance for distinguishing amolecule by a specific recognition mechanism such as receptor-ligand, ormore specifically, a substance serving the role of a binding partner tobind a specific substance (e.g., having a relationship such asbiotin-avidin or biotin-streptavidin). A tag can be encompassed in thescope of “label”. Accordingly, a specific substance to which a tag isbound can distinguish the specific substance by a contact with asubstrate, to which a binding partner of a tag sequence is bound. Such atag or label is known in the art. Typical tag sequences include, but arenot limited to, myc tag, His tag, HA, Avi tag and the like. Such a tagmay be bound to the marker or marker detecting agent of the presentinvention.

The method of the present invention can be carried out by contacting thedetecting agent or diagnostic agent of the present invention with asample of interest to measure whether there is a target read of interestor a gene of the lead in the sample, or the level or amount thereof.

As used herein, “contact(ed)” refers to physically adjoin, eitherdirectly or indirectly, a substance to a polypeptide or a polynucleotidethat can function as the marker, detecting agent, diagnostic agent,ligand or the like of the present invention. A polypeptide orpolynucleotide can be included in many buffer solutions, salts,solutions and the like. Contact includes placing a compound in, forexample, a beaker, microtiter plate, cell culture flask, microarray(e.g., gene chip) or the like comprising a polypeptide encoding anucleic acid molecule or a fragment thereof.

In another aspect, the present invention provides a peptide which is anovel invariant TCR, comprising any one of the sequences set forth inSEQ ID NOs: 1627-1647. Such a peptide can be used as an invariant andapplied as various indicators (e.g., indicator of a disease or thelike).

In still another aspect, the present invention provides a TCR peptidehaving a mucosal-associated invariant T (MAIT) cell, comprising asequence selected from the group consisting of SEQ ID NOs 1648-1651,1653-1654, 1666-1667, 1844-1848, and 1851 or a nucleic acid encodingsuch a peptide. Such a peptide and nucleic acid can be used as amucosal-associated invariant T (MAIT) and applied as various indicators(e.g., indicator of a disease or the like). In one specific embodiment,a peptide which is a TCR having a mucosal-associated invariant T (MAIT)of the present invention or a nucleic acid encoding such a peptide canbe used as a diagnostic indicator of colorectal cancer.

In another aspect, the present invention provides a peptide, which is aTCR having a natural killer T (NKT) cell comprising the sequence setforth in SEQ ID NO: 1668 and a nucleic acid encoding the peptide. In onespecific embodiment, the peptide, which is a TCR having a natural killerT (NKT) cell and a nucleic acid encoding the peptide can be used as adiagnostic indicator of colorectal cancer.

In another aspect, the present invention provides a colorectalcancer-specific peptide comprising a sequence selected from the groupconsisting of SEQ ID NOs: 1652, 1655-1665, 1669-1843, 1849-1850, and1852-1860 and a nucleic acid encoding the same. In one specificembodiment, such a peptide and nucleic acid encoding the same can beused as a diagnostic indicator of colorectal cancer.

In a further aspect, the present invention provides a colorectal cancerspecific peptide, comprising a sequence selected from the groupconsisting of SEQ ID NOs: 1861-1865 and 1867-1909 and a nucleic acidencoding the same. In one specific embodiment, such a peptide andnucleic acid encoding the same can be used as a diagnostic indicator ofcolorectal cancer.

In another aspect, the present invention provides a cell populationinducing a T cell at a high frequency, T cell cell line, orrecombinantly expressed T cell having a peptide comprising a sequenceselected from the group consisting of SEQ ID NOs: 1652, 1655-1665,1669-1843, 1849-1850, and 1852-1860 and SEQ ID NOs: 1861-1865 and1867-1909 or a nucleic acid sequence encoding the peptide. A cellpopulation, cell line, cell, colorectal cancer specific TCR peptide ornucleic acid encoding said peptide is useful in diagnosis or therapy.For diagnosis, colorectal cancer can be discovered, or pathologicalcondition or prognosis can be predicted, by examining if the sequence isonly in colorectal cancer patients, the sequence is observed more incolorectal cancer patients, or the sequence accumulates in cancer tissueof the same patient. For therapy of colorectal cancer, it is possible toutilize a cell population inducing a T cell at a high frequency with acolorectal cancer specific sequence, a T cell cell line with acolorectal cancer specific sequence, or a T cell (lymphocyte)artificially made to express a colorectal cancer specific sequence (Asreference documents, see 1: Uttenthal B J, Chua I, Morris E C, Stauss HJ. Challenges in T cell receptor gene therapy. J Gene Med. 2012 June;14(6): 386-99. doi: 10.1002/jgm.2637. Review. PubMed PMID: 22610778.; 2:Linnemann C, Schumacher T N, Bendle G M. T-cell receptor gene therapy:critical parameters for clinical success. J Invest Dermatol. 2011September; 131(9): 1806-16.doi: 10.1038/jid.2011.160. Epub 2011 Jun. 16.Review. PubMed PMID: 21677669.; 3: Lagisetty K H, Morgan R A. Cancertherapy with genetically-modified T cells for the treatment of melanoma.J Gene Med. 2012 June; 14(6): 400-4. doi: 10.1002/jgm.2636. Review.PubMed PMID: 22610729). Thus, the present invention provides atherapeutic agent or a prophylactic agent for colorectal cancer,comprising the above-described cell population, T cell cell line, or Tcell.

(Application) The present invention can be used to calculate a basesequence (read) of a TCR or BCR gene identified by a large-scalesequencing and a frequency of appearance thereof with a software to drawa list, a distribution or a graph. Based on such information, a changein a repertoire is detected by using the following various indicators.Association with a disease or disorder can be found based on such achange.

In one aspect, the present invention provides a method of detecting ausage frequency of a V gene by using the analysis method or the analysissystem of the present invention. A V gene of each read can be identifiedto calculate the percentage of each V gene with the respect to theentire TCR or BCR genes. It is possible to find an increase or decreasein usage frequency of V associated with a disease or pathologicalcondition.

In another aspect, the present invention provides a method of detectinga usage frequency of a J gene by using the analysis method or theanalysis system of the present invention. A J gene of each read can beidentified to calculate the percentage of each J gene with the respectto the entire TCR or BCR genes. It is possible to find an increase ordecrease in usage frequency of J associated with a disease orpathological condition.

In another aspect, the present invention provides a method of detectinga usage frequency of subtype frequency analysis (BCR) by using theanalysis method or the analysis system of the present invention. It ispossible to calculate the frequency of presence of subtypes IgA1, IgA2,IgG1, IgG2, IgG3, and IgG4 based on sequencing of a C region. It ispossible to find an increase or decrease in a specific subtypeassociated with a disease or pathological condition.

In another aspect, the present invention provides a method of analyzinga pattern of CDR3 sequence lengths by using the analysis method or theanalysis system of the present invention. A CDR3 base sequence length ofeach read can be calculated to find the distribution thereof. A normaldistribution-like peak pattern is exhibited from normal TCRs or BCRs. Itis possible to find the association with a disease or pathologicalcondition by detecting a peak deviating away from a normal distribution.

In another aspect, the present invention provides a method of analyzingclonality of a TCR or a BCR by using the analysis method or the analysissystem of the present invention. Reads having the same sequence areclassified based on V sequence, J sequence, and CDR3 sequence of eachread to calculate the number of copies thereof. It is possible to find alead present at a high frequency by calculating the percentage of thenumber of copies of each lead relative to the number of all reads. Thedegree of clonality is assessed by sorting the reads in descending orderby the frequency of appearance and comparing the percentage or number ofreads that are present at a high frequency with a normal sample. Achange in TCR or BCR clonality associated with a disease or pathologicalcondition is examined therewith. The degree of clonality can be usedparticularly in detecting a leukemic cell or the like.

In another aspect, the present invention provides a method of extractingan overlapping read by using the analysis method or the analysis systemof the present invention. A read of a sample classified by a specificdisease, disease type, pathological condition, tissue, genotype (HLA orthe like) is searched to extract overlapping TCR or BCR reads betweensamples. It is possible to find a TCR or BCR gene associated with acondition of a disease or disorder therewith. It is possible to identifya disease specific T cell involved in the pathology of an autoimmunedisease, a B cell producing a disease associated antibody, a cancerspecific T cell attacking a cancer cell or the like.

In another aspect, the present invention provides a method of searchingfor a disease specific TCR or BCR clone by using the analysis method orthe analysis system of the present invention. It is possible to predictthe progression or amelioration in a pathological condition or the onsetof a disease by searching for a TCR or BCR read associated with aspecific condition of a disorder or disease in a test sample andrevealing the appearance or disappearance, or increase or decreasethereof.

In another aspect, the present invention provides a method of analyzinga subject with a diversity index by using the analysis method or theanalysis system of the present invention. Alternatively, the presentinvention provides a method of assisting analysis on a subject with adiversity index by using the analysis method or the analysis system ofthe present invention. A read sequence identified based on a CDR3sequence is counted and the number of read types and number ofindividuals are calculated to form an index for diversity of a TCR orBCR repertoire. The Shannon-Wiener's diversity index (H′), Simpson'sdiversity index (λ, 1−λ, or 1/λ), Pielou's evenness index (J′), Chaolindex or the like is used to assess diversity by comparison with anormal sample. The index can be utilized as an indicator for measuring adegree of recovery of an immune system after bone marrowtransplantation. Further, the index can be utilized as an indicator fordetecting abnormality in an immune system cell accompanied byhematopoietic tumor.

In one embodiment, a method of analyzing a subject with a diversityindex uses the diversity index as an indicator for measuring a degree ofrecovery of an immune system after bone marrow transplantation or as anindicator for detecting abnormality in a cell of the immune systemaccompanied by hematopoietic tumor. Such analysis using a diversityindex was difficult with a conventional system.

Various diversity indices can be calculated by using an EXCELspreadsheet or a software such as ESTIMATES (Colwell, R. K. et al.Journal of Plant Ecology 5: 3-21.) or R package (vegan) from data forthe number of samples, read types, or the number of reads. TheShannon-Wiener's diversity index (H′), Simpson's diversity index (λ,1−λ, or 1/λ), Pielou's evenness index (J′) and Chaol index are found bythe mathematical equations shown below. N: total number of reads, n_(i):number of reads in read i Shannon-Weaver index H′

$\begin{matrix}{H^{\prime} = {- {\sum\limits_{i = 1}^{S}{\frac{n_{i}}{N}\ln\frac{n_{i}}{N}}}}} & \left\lbrack {{Numeral}\mspace{14mu} 1} \right\rbrack\end{matrix}$

Simpson's index λ

$\begin{matrix}{{1 - \lambda} = {1 - {\sum\limits_{i = 1}^{S}\left( \frac{n_{i}\left( {n_{i} - 1} \right)}{N\left( {N - 1} \right)} \right)}}} & \left\lbrack {{Numeral}\mspace{14mu} 2} \right\rbrack\end{matrix}$

Inverse Simpson's index

$\begin{matrix}\frac{1}{\lambda} & \left\lbrack {{Numeral}\mspace{14mu} 3} \right\rbrack\end{matrix}$

Pielou's J

$\begin{matrix}{J = \frac{H^{\prime}}{\log\; S}} & \left\lbrack {{Numeral}\mspace{14mu} 4} \right\rbrack\end{matrix}$

S_(chaol) S_(obs): total number of read types, F₁: singleton read, F₂:doubleton read

$\begin{matrix}{S_{{chao}\; 1} = {S_{obs} - {\left( \frac{n - 1}{n} \right)\frac{F_{1}\left( {F_{1} - 1} \right)}{2\left( {F_{2} + 1} \right)}}}} & \left\lbrack {{Numeral}\mspace{14mu} 5} \right\rbrack\end{matrix}$

In another aspect, the present invention is a method of analyzing asubject with a similarity index by using the analysis method or theanalysis system of the present invention. Alternatively, the presentinvention provides a method of assisting analysis on a subject with asimilarly index by using the analysis method or the analysis system ofthe present invention. The number of individuals and the number of typesof read sequences identified based on a CDR3 sequence are calculated tofind the degree of similarly of a TCR or BCR repertoire between samplesto be compared. The Morisita-Horn index, Kimoto's Cn index, or Pianka'sα index is used to find a degree of similarly between samples. Such anindex can be utilized in the assessment of a degree of similarity ofrepertoires between matching and mismatching HLA types, assessment of adegree of similarly of repertoires between a recipient and a donor afterbone marrow transplantation.

In one embodiment, the similarity index is used as assessment of adegree of similarity of repertoires between matching and mismatching HLAtypes, or as assessment of a degree of similarly of repertoires betweena recipient and a donor after bone marrow transplantation. Such analysisusing a similarity index was difficult with a conventional system.Various similarity indices can be calculated with ESTIMATES (Colwell, R.K. et al. Journal of Plant Ecology 5: 3-21.) or R package (vegan) byusing the following mathematical equations. The Morisita-Horn index,Kimoto's Cn index, and Pianka's a index are found by the mathematicalequations shown below.

Morisita-Horn index, X_(i): number of times read i appear in all X readsfrom one of the samples, y₁: number of times read i appear in all Yreads from the other sample, S: number of unique reads.

$\begin{matrix}{C_{MH} = \frac{2{\sum_{i = 1}^{S}{x_{i}y_{i}}}}{\left( {\frac{\sum_{i = 1}^{S}x_{i}^{2}}{X^{2}} + \frac{\sum_{i = 1}^{S}y_{i}^{2}}{Y^{2}}} \right){XY}}} & \left\lbrack {{Numeral}\mspace{14mu} 6} \right\rbrack\end{matrix}$

Kimoto's Cπ index

$\begin{matrix}{C_{\pi} = \frac{2{\sum_{i = 1}^{S}{x_{i}y_{i}}}}{\left( {{\sum_{i = 1}^{S}p_{xi}^{2}} + {\sum_{i = 1}^{S}p_{yi}^{2}}} \right){XY}}} & \left\lbrack {{Numeral}\mspace{14mu} 7} \right\rbrack \\{{p_{xi} = \frac{x_{i}}{X}},{p_{yi} = \frac{y_{i}}{Y}}} & \left\lbrack {{Numeral}\mspace{14mu} 7\text{-}1} \right\rbrack\end{matrix}$

Pianka's α index

$\begin{matrix}{\alpha = \frac{\sum_{i = 1}^{S}{p_{xi}p_{yi}}}{\sqrt{\sum_{i = 1}^{S}{p_{xi}^{2}{\sum_{i = 1}^{S}p_{yi}^{2}}}}}} & \left\lbrack {{Numeral}\mspace{14mu} 8} \right\rbrack\end{matrix}$

The present invention can use next generation sequencing techniques toprepare a sample for quantitative analysis of a repertoire of a variableregion of a T receptor (TCR) or a B cell receptor (BCR). Such sequencingtechniques can obtain a million or more reads from a sample at areasonable cost. Even a genotype that exists at a low frequency of1/1,000,000 or less can be detected by using these techniques in aspecific and unbiased manner. An unbiased amplification method foramplifying all different types of sequences of a specific portion of agene or a transcript from a sample derived from a DNA of blood, bonemarrow or the like is achieved.

<Cancer Idiotype Peptide Sensitization Immune Cell Therapeutic Method>

In one aspect, the present invention provides a method of preparing acomposition for use in a cancer idiotype peptide sensitization immunecell therapeutic method to a subject. The method comprises (1) analyzinga T cell receptor (TCR) or B cell receptor (BCR) repertoire of thesubject by the repertoire analysis method of the present invention orthe repertoire analysis system of the present invention; (2) determininga TCR or BCR derived from a cancer cell of the subject based on a resultof the analysis, wherein the determining is done by selecting a highranking sequence in a frequency of presence ranking of a TCR or BCR genederived from the cancer cell of the subject as the TCR or BCR derivedfrom the cancer cell; (3) determining an amino acid sequence of acandidate HLA test peptide based on the determined TCR or BCR derivedfrom cancer, wherein the determining is performed based on a scorecalculated by using an HLA binding peptide prediction algorithm; and (4)synthesizing the determined peptide. In this regard, a synthesizedpeptide can be used in a cancer idiotype peptide sensitization immunecell therapeutic method. In some cases, this method is called a “canceridiotype peptide sensitization immune cell therapeutic method” herein.

A cancer idiotype peptide sensitization immune cell therapeutic methodcan be implemented in clinical practice by using the following specificprocedures. In short, for example, (1) a peripheral blood cell of acancer patient suffering from hematological tumor can be collected andlymphocyte cells can be separated to subsequently implement therepertoire analysis method of the present invention, and a canceridiotype peptide sensitization immune cell therapeutic method can beperformed with the use thereof.

In another embodiment, the repertoire analysis method of the presentinvention can be implemented for a TCR in case of T cell based tumor orfor a BCR in case of B cell based tumor. Subsequently, a high rankingsequence in a frequency of presence ranking of a TCR or BCR gene isselected as the TCR or BCR derived from the cancer cell. A peptide thatbinds to a human leukocyte antigen (HLA) of the cancer patientdetermined separately from a sequence comprising a CDR3 region of theTCR or BCR gene is predicted by using an HLA binding peptide predictionprogram (any known program can be used as further explained herein). Inaddition, an HLA binding peptide is synthesized by a peptide synthesizerand the following is subsequently performed. For a tailor-made peptidesensitization CTL therapeutic method, it is possible to collectperipheral blood mononuclear cells from a patient and culture a mixtureof the mononuclear cells or antigen presenting cells from the patientand a CD8⁺ T cell added with the peptide to apply a stimulation with anantigen peptide.

For a tailor-made peptide sensitization CTL therapeutic method, a CTLtherapeutic method can be administered by introducing the peptidestimulated lymphocyte cell into the patient.

Alternatively, another method of a tailor-made peptide sensitization DCvaccine therapeutic method can be materialized by collecting aperipheral blood mononuclear cell of a patient, separating a mononuclearcell, inducing differentiation into a dendritic cell (DC) in thepresence of a differentiation inducing factor, adding the peptide andculturing the mixture, and introducing the peptide sensitizationdendritic cell into the patient to administer dendritic cell therapy.

A cancer idiotype peptide sensitization immune cell therapeutic methodcan be used in patients with hematologic cancer such as acute myeloidleukemia and related precursor cell neoplasm, lymphoblasticleukemia/lymphoma, T lymphoblastic leukemia/lymphoma, chroniclymphocytic leukemia/small lymphocytic lymphoma, B-cell prolymphocyticleukemia, hairy cell leukemia, T-cell prolymphocytic leukemia, T-celllarge granular lymphocyte leukemia, and adult T cell leukemia/lymphoma,diseases similar to leukemia such as multiple myeloma andmyelodysplastic syndrome, autoimmune diseases such as rheumatoidarthritis, systemic lupus erythematosus, and type I diabetes, andvarious infections, as well as for patients with terminal cancer,refractory autoimmune disease or severe infection. In particular, it isproblematic for an antibody therapeutic method targeting a tumor cell orthe like when a target antigen is not expressed on a tumor cell or atarget antigen is also expressed on a normal cell. In comparisonthereto, a therapeutic method with higher specificity and fewer sideeffects is expected because a sequence specific to a tumor cell isselected and utilized.

In one embodiment, the candidate HLA test peptide of the step (3) in thepresent invention is determined by using BIMAS, SYFPEITHI, RANKPEP orNetMHC.

In another embodiment, the present invention comprises, after the step(4) in the present invention, the step of: mixing the peptide, anantigen presenting cell or a dendritic cell derived from the subject,and a CD8⁺ T cell derived from the subject and culturing the mixture.This is also called an improved CTL method.

For example, unlike the existing broad T cell activation by an anti-CD3antibody or IL-2, antigen specificity is imparted to a CD8⁺ T cellutilizing an antigen peptide such that therapy with higher specificityand fewer side effect can be expected in the improved CTL method.Further, the method is characterized in that a higher therapeutic effectcan be expected because an individualized peptide created based on theinformation obtained from a tumor cell of the patient is used.

An improved CTL method can be used in, for example, patients withhematologic cancer such as acute myeloid leukemia and related precursorcell neoplasms, lymphoblastic leukemia/lymphoma, T lymphoblasticleukemia/lymphoma, chronic lymphocytic leukemia/small lymphocyticlymphoma, B-cell prolymphocytic leukemia, hairy cell leukemia, T-cellprolymphocytic leukemia, T-cell large granular lymphocyte leukemia, andadult T cell leukemia/lymphoma, diseases similar to leukemia such asmultiple myeloma and myelodysplastic syndrome, autoimmune diseases suchas rheumatoid arthritis, systemic lupus erythematosus, and type Idiabetes, and various infections, as well as for terminal cancerpatients and patients with a refractory autoimmune disease or severeinfection.

In another embodiment, the present invention comprises, after the step(4) of the present invention, the step of: mixing the peptide with adendritic cell derived from the subject and culturing the mixture. Thisis also called a DC vaccine therapy.

For example, since an individualized peptide is created based on thesequence information obtained from a tumor cell derived from the patientin DC vaccine therapy, such therapy does not act on a normal cell butact more specifically to a tumor cell such that a high therapeuticeffect can be expected. Since a peptide is used as an antigen, unlikeproteins, there is an advantage in being able to readily performchemical synthesis.

DC vaccine therapy can be used in, for example, hematologic cancer suchas acute myeloid leukemia and related precursor hematologic neoplasms,lymphoblastic leukemia/lymphoma, T lymphoblastic leukemia/lymphoma,chronic lymphocytic leukemia/small lymphocytic lymphoma, B-cellprolymphocytic leukemia, hairy cell leukemia, T-cell prolymphocyticleukemia, T-cell large granular lymphocyte leukemia, and adult T cellleukemia/lymphoma, diseases similar to leukemia such as multiple myelomaand myelodysplastic syndrome, autoimmune diseases such as rheumatoidarthritis, systemic lupus erythematosus, and type I diabetes, andpatients with various infections, as well as for patients with terminalcancer, a refractory autoimmune disease or severe infection.

In another embodiment, the present invention comprises, after the step(4) of the present invention, the steps of: mixing the peptide, theantigen presenting cell or the dendritic cell derived from the subjectand a CD8⁺ T cell derived from the subject and culturing the mixture toproduce a CD8⁺ T cell-dendritic cell/antigen presenting cell-peptidemixture; and mixing the peptide with the dendritic cell derived from thesubject and culturing the mixture to produce a dendritic cell-peptidemixture. This is also called a patient autoimmune cell therapeuticmethod.

For example, CD8⁺ T cell is stimulated and activated with a peptidederived from the patient as in a CTL therapeutic method and peptidesensitization of a dendritic cell is performed in a patient autoimmunecell therapeutic method. Such a therapeutic method is characterized inthat a synergistic effect of a sustained effect due to the dendriticcell utilized as the antigen presenting cell and an acute effect due toCTL imparting specificity can be expected by introducing both thedendritic cell and the CD8+ cell derived from the patient into thepatient.

A patient autoimmune cell therapeutic method can be used in, forexample, patients with hematologic cancer (leukemia etc) such as acutemyeloid leukemia and related precursor cell neoplasms, lymphoblasticleukemia/lymphoma, T lymphoblastic leukemia/lymphoma, chroniclymphocytic leukemia/small lymphocytic lymphoma, B-cell prolymphocyticleukemia, hairy cell leukemia, T-cell prolymphocytic leukemia, T-celllarge granular lymphocyte leukemia, and adult T cell leukemia/lymphoma,diseases similar to leukemia such as multiple myeloma andmyelodysplastic syndrome, autoimmune diseases such as rheumatoidarthritis, systemic lupus erythematosus, and type I diabetes, andvarious infections, as well as for patients with terminal cancer, arefractory autoimmune disease or severe infection.

In another aspect, the present invention provides a method of applying acancer idiotype peptide sensitization immune cell therapeutic method toa subject. The method comprises (1) analyzing a T cell receptor (TCR) orB cell receptor (BCR) repertoire of the subject by the repertoireanalysis method of the present invention or the repertoire analysissystem of the present invention; (2) determining a TCR or BCR derivedfrom a cancer cell of the subject based on a result of the analysis,wherein the determining is done by selecting a high ranking sequence ina frequency of presence ranking of a TCR or BCR gene derived from thecancer cell of the subject as the TCR or BCR derived from the cancercell; (3) determining an amino acid sequence of a candidate HLA testpeptide based on the determined TCR or BCR derived from cancer, whereinthe determining is performed based on a score calculated by using an HLAbinding peptide prediction algorithm; (4) synthesizing the determinedpeptide; and optionally (5) administering therapy by using thesynthesized peptide. The method encompasses both a method ofmanufacturing a therapeutic agent and a method of therapy itself. Whenexcluding a medical act, the method can be completed before step (5).

In a preferred embodiment, the candidate HLA test peptide of the step(3) is determined by using BIMAS, SYFPEITHI, RANKPEP or NetMHC in thepresent invention.

BIMAS is a program for estimating HLA peptide bond provided at wwwdot-bimas dot cit dot nih dot gov/.

SYFPEITHI is a search engine and a database for MHC ligands and peptidemotifs provided at www dotsyfpeithi dot de/.

RANKPEP is a program for predicting a peptide bond to class I and classII MHC molecules, provided at http colon//imed dot med dot ucm dotes/Tools/rankpep dot html.

NetMHC is a program server for predicting binding of a peptide tonumerous HLA alleles, provided at www dot cbs dot dtu dotdk/services/NetMHC/.

In a preferred embodiment, the present invention comprises, after thestep (4), the steps of: mixing the peptide, an antigen presenting cellor a dendritic cell derived from the subject, and a CD8⁺ T cell derivedfrom the subject and culturing the mixture; and administering themixture after culturing to a patient as an improved CTL method.

In a preferred embodiment, the present invention comprises, after thestep (4), the steps of: mixing the peptide with the dendritic cellderived from the subject and culturing the mixture; and administeringthe cultured mixture to a patient as a DC vaccination therapeuticmethod.

In a preferred embodiment, the present invention comprises, after thestep (4), the steps of: mixing the peptide, the antigen presenting cellor the dendritic cell derived from the subject and a CD8⁺ T cell derivedfrom the subject and culturing the mixture to produce a CD8⁺ Tcell-dendritic cell/antigen presenting cell-peptide mixture; mixing thepeptide with the dendritic cell derived from the subject and culturingthe mixture to produce a dendritic cell-peptide mixture; andadministering the CD8⁺ T cell-dendritic cell/antigen presentingcell-peptide mixture and the dendritic cell-peptide mixture to a patientas a patient autoimmune cell therapeutic method.

<Isolation of Tailor-Made Cancer Specific T Cell Receptor Gene,Isolation of Cancer Specific TCR Gene by In Vitro Antigen Stimulation>

In another aspect, the present invention provides a technique forisolation of a tailor-made cancer specific T cell receptor gene orisolation of a cancer specific TCR gene by in vitro antigen stimulation.Thus, the present invention provides a method of preparing an isolatedcancer specific TCR gene by an in vitro antigen stimulation, comprising:(A) mixing an antigen peptide or antigen protein derived from a subjector the peptide determined in the “Cancer idiotype peptide sensitizationimmune cell therapeutic method” of the present invention or a lymphocytederived from the subject, an inactivated cancer cell derived from thesubject, and a T lymphocyte derived from the subject and culturing themixture to produce a tumor specific T cell; (B) analyzing a TCR of thetumor specific T cell by the repertoire analysis method of the presentinvention or the repertoire analysis system of the present invention;and (C) isolating a desired tumor specific T cell based on a result ofthe analyzing. Such preparation of an isolated cancer specific TCR geneby an in vitro antigen stimulation can be implemented by using anywell-known technology in the art once the gene information is obtained.Such an isolated tailor-made cancer specific T cell receptor gene and acancer specific TCR gene can be used to treat or prevent a variety ofcancers.

Such an isolated tailor-made cancer specific T cell receptor gene andcancer specific TCR gene can be implemented in clinical practice byusing the following specific procedures.

In one embodiment, therapy using an isolated tailor-made cancer specificT cell receptor gene and cancer specific TCR gene can be materialized,for example, as follows: (1) tumor cells are extracted from a cancerpatient; (2) after crushing the tumor cells from the patient, cells areseparated into single cells and inactivated by radiation irradiation orchemical treatment with mitomycin C or the like; (3) peripheral bloodcells are separated from whole blood of the cancer patient; (4) an RNAis extracted from cells, with some of the peripheral blood cells as anuntreated control sample; (5) the inactivated tumor cells and theperipheral blood cells are mixed and cultured to activate andproliferate the tumor specific T cells; (6) after activation, an RNA isextracted from the cells by collecting the peripheral blood cells as asample after stimulation; (7) the repertoire analysis method of thepresent invention is implemented on the RNA samples extracted in (4) and(6); (8) TCR genes that have greatly increased with a stimulation samplerelative to a control sample are extracted and ranked, and then highranking TCRα and TCRβ genes are selected; (9) the full-length TCRα andTCRβ genes are cloned and introduced into a retroviral vector for geneexpression; (10) a gene introducing virus is created from the TCRα andTCRβ gene expression retroviral vector; (11) lymphocytes collected fromthe patient are infected independently and successively with TCRα andTCRβ for transfection, or a gene expression retroviral vector comprisingboth TCRα and TCRβ genes is created to transform both genes at once;(12) expression of TCRα/TCRβ heterodimers on a cell surface isconfirmed; and (13) a tumor specific patient lymphocyte expressingTCRα/TCRβ of interest is introduced into the cells of the patient.

Specifically, the TCR or BCR determined by the method described in the“Cancer idiotype peptide sensitization immune cell therapeutic method”can be used as an antigen or peptide, for example, for hematologicaltumor in the embodiments of the present invention. In this regard, anycancer antigen or inactivated cancer tissue from a patient is presumed,where the following can be utilized as a typical method: a method ofmixing any antigen protein or any antigen peptide, T lymphocyte, andantigen presenting cell; a method of mixing a lymphocyte from a subjectand an inactivated cancer cell from the subject; and a method of mixingan antigen-resenting cell, T lymphocyte, and peptide derived from a TCRor BCR determined by the repertoire analysis provided in “Canceridiotype peptide sensitization immune cell therapeutic method”.

Thus, in one embodiment, the step (A) in the present invention is a stepof mixing the inactivated cancer cell derived from the subject and theantigen peptide or antigen protein derived from the subject with the Tlymphocyte derived from the subject and culturing the mixture to producea tumor specific T cell.

In a further embodiment, the step (A) in the present invention is a stepof mixing the lymphocyte derived from the subject, the inactivatedcancer cell derived from the subject, and the T lymphocyte derived fromthe subject and culturing the mixture to produce a tumor specific Tcell.

In a further embodiment, the step (A) in the present invention is a stepof mixing the peptide determined in “Cancer idiotype peptidesensitization immune cell therapeutic method”, the inactivated cancercell derived from the subject, and the T lymphocyte derived from thesubject and culturing the mixture to produce a tumor specific T cell.

Such therapy of an isolated tailor-made cancer specific T cell receptorgene and a cancer specific TCR gene can be used in patients with a widerange of cancer, including, but not limited to, adrenocorticalcarcinoma, anal cancer, bile duct cancer, bladder cancer, breast cancer,cervical cancer, chronic lymphocytic leukemia, chronic myelogenousleukemia, colorectal cancer, endometrial cancer, esophageal cancer,Ewing tumor, gallbladder cancer, Hodgkin's disease, hypopharyngealcancer, laryngeal cancer, lip and oral cavity cancer, liver cancer,non-small-cell lung cancer, non-Hodgkin's lymphoma, melanoma,mesothelioma, multiple myeloma, ovarian cancer, pancreatic cancer,prostate cancer, gastric cancer, testicular cancer, thyroid cancer andthe like.

In a further aspect, the present invention provides isolation of atailor-made cancer specific T cell receptor gene, and isolation of acancer specific TCR gene by searching for a common sequence. Thus, thepresent invention provides a method of preparing an isolated cancerspecific TCR gene by searching for a common sequence, comprising: (A)providing a lymphocyte or cancer tissue isolated from subjects having acommon HLA; (B) analyzing a TCR of the tumor specific T cell by therepertoire analysis method of the present invention or the repertoireanalysis system of the present invention for the lymphocyte or cancertissue; and (C) isolating a T cell having a sequence in common with thetumor specific T cell. Once genetic information is obtained, preparationof an isolated cancer specific TCR gene by searching for a commonsequence can be performed by using any well-known technology in the art.A gene obtained by such isolation of tailor-made cancer specific T cellreceptor gene or isolation of a cancer specific TCR gene by searchingfor a common sequence can be used in therapy and prevention of a varietyof cancers. The method is also called “method of isolation oftailor-made cancer specific T cell receptor gene or isolation of cancerspecific TCR gene by searching for a common sequence of the presentinvention”.

A gene obtained by such isolation of a tailor-made cancer specific Tcell receptor gene or isolation of a cancer specific TCR gene bysearching for a common sequence can be implemented in clinical practiceby using the following specific procedures. In one embodiment, it ispossible to materialize therapy using a gene obtained by isolation of atailor-made cancer specific T cell receptor gene or isolation of acancer specific TCR gene by searching for a common sequence from thefollowing: first (1) tumor cells are extracted or peripheral blood isseparated from cancer patients with the same HLA; (2) repertoireanalysis is performed by using a lymphocyte cell or tumor tissuecomprising a tumor cell infiltrated T cell; (3) a ranking is producedfor each sample based on a frequency of presence thereof, and a tumorspecific T cell exhibiting a higher frequency of presence in a tumorcell relative to a peripheral blood cell is selected; (4) a commonsequence in a plurality of HLA matching cancer patients is searched forthe tumor specific T cell; (5) a tumor specific TCR gene shared by themost cancer patients is selected as a tumor specific TCR for therapy;(6) the full length TCRα and TCRβ genes are cloned and introduced into aretroviral vector for gene expression; (7) a gene introducing virus iscreated from the TCRα and TCRβ gene expression retroviral vector; (8)lymphocytes collected from the patient are infected independently andsuccessively with TCRα and TCRβ for transfection, or a gene expressionretroviral vector comprising both TCRα and TCRβ genes is created totransform both genes at once; (9) expression of TCRα/TCRβ heterodimerson a cell surface is confirmed; and (10) a tumor specific patientlymphocyte expressing TCRα/TCRβ of interest is introduced into the cellsof the patient.

Therapy using a gene obtained by such isolation of a tailor-made cancerspecific T cell receptor gene or isolation of a cancer specific TCR geneby searching for a common sequence can be used in patients with a widerange of cancer, for example including, but not limited to,adrenocortical carcinoma, anal cancer, bile duct cancer, bladder cancer,breast cancer, cervical cancer, chronic lymphocytic leukemia, chronicmyelogenous leukemia, colorectal cancer, endometrial cancer, esophagealcancer, Ewing tumor, gallbladder cancer, Hodgkin's disease,hypopharyngeal cancer, laryngeal cancer, lip and oral cavity cancer,liver cancer, non-small-cell lung cancer, non-Hodgkin's lymphoma,melanoma, mesothelioma, multiple myeloma, ovarian cancer, pancreaticcancer, prostate cancer, gastric cancer, testicular cancer, thyroidcancer and the like.

Thus, in another aspect, the present invention provides a method ofisolating a cancer specific TCR gene by an in vitro antigen stimulation,comprising: (A) mixing an antigen peptide or antigen protein derivedfrom a subject or the peptide determined in a cancer idiotype peptidesensitization immune cell therapeutic method or a lymphocyte derivedfrom the subject, an inactivated cancer cell derived from the subject,and a T lymphocyte derived from the subject and culturing the mixture toproduce a tumor specific T cell; (B) analyzing a TCR of the tumorspecific T cell by the repertoire analysis method of the presentinvention or the repertoire analysis system of the present invention;and (C) isolating a desired tumor specific T cell based on a result ofthe analyzing. Once genetic information is obtained, preparation of acancer specific TCR gene isolated by such an in vitro antigenstimulation can be performed by using any well-known technology in theart. Such an isolated tailor-made cancer specific T cell receptor geneor cancer specific TCR gene can be used in therapy and prevention of avariety of cancers.

Thus, in one embodiment of the method of isolating a cancer specific TCRgene by an in vitro antigen stimulation, the step (A) in the presentinvention comprises a step of mixing the inactivated cancer cell derivedfrom the subject and the antigen peptide or antigen protein derived fromthe subject with the T lymphocyte derived from the subject and culturingthe mixture to produce a tumor specific T cell.

In a further embodiment, the step (A) in the present invention is a stepof mixing the lymphocyte derived from the subject, the inactivatedcancer cell derived from the subject, and the T lymphocyte derived fromthe subject and culturing the mixture to produce a tumor specific Tcell.

In a further embodiment, the step (A) in the present invention is a stepof mixing the peptide determined in “Cancer idiotype peptidesensitization immune cell therapeutic method”, the inactivated cancercell derived from the subject, and the T lymphocyte derived from thesubject and culturing the mixture to produce a tumor specific T cell.

In still another aspect, the present invention provides a technique ofisolating a cancer specific TCR gene by searching for a common sequenceor isolating a tailor-made cancer specific T cell receptor gene,comprising: (A) isolating a lymphocyte or cancer tissue from subjectshaving a common HLA; (B) analyzing a TCR of the tumor specific T cell bythe repertoire analysis method of the present invention for thelymphocyte or cancer tissue; and (C) isolating a T cell having asequence in common with the tumor specific T cell. Such an isolatedtailor-made cancer specific T cell receptor gene or cancer specific TCRgene can be used in therapy and prevention of a variety of cancers.

<Cell Processing Therapeutic Method>

In a further aspect, the present invention provides a cell processingtherapeutic method. Specifically, the present invention provides amethod of preparing a T lymphocyte introduced with a tumor specific TCRgene for use in cell processing therapeutic method, comprising: A)providing a T lymphocyte collected from a patient; B) analyzing TCRsbased on the repertoire analysis method of the present invention or therepertoire analysis system of the present invention after applying anantigen stimulation to the T lymphocyte, wherein the antigen stimulationis applied by an antigen peptide or antigen protein derived from thesubject, an inactivated cancer cell derived from the subject, or anidiotype peptide derived from tumor; C) selecting an optimal TCR and anoptimal antigen in the analyzed TCRs; and D) producing a tumor specificα and β TCR expression viral vector of a TCR gene of the optimal TCR.The cell processing therapeutic method using the T lymphocyte introducedwith a tumor specific TCR gene can be used for the therapy andprevention of a variety of cancers.

Such a cell processing therapeutic method using a T lymphocyteintroduced with a tumor specific TCR gene can be implemented in clinicalpractice by using the following specific procedures. For example, alymphocyte introduced with a tumor specific TCR gene can be used by themethod described in <Isolation of tailor-made cancer specific T cellreceptor gene, isolation of cancer specific TCR gene by in vitro antigenstimulation> or <Isolation of tailor-made cancer specific T cellreceptor gene, isolation of cancer specific TCR gene by searching for acommon sequence>.

Thus, any cancer antigen or cancer peptide can be manufactured orproduced by synthesis as an antigen to utilize a collected inactivatedpatient cancer cell or to utilize an idiotype peptide derived from tumorin the cell processing therapeutic method of the present invention. As aselection method, it is possible to select an antigen highly expressedin cancer tissue or select a peptide that binds to the HLA type of apatient as an antigen.

In a preferred embodiment of the cell processing therapeutic method ofthe present invention, examples of conceivable optimal antigen that canbe selected include, but are not limited to, (1) an antigen highlyexpressed in the patient's cancer tissue, (2) an antigen that moststrongly activates a T cell in an antigen specific lymphocytestimulation test, and (3) an antigen that increases the frequency of aspecific TCR the most from repertoire analysis before and after anantigen stimulation. Further, it is also possible to conceive a methodof selecting, as an optimal TCR, a TCR that has increased the most in anexample (3), where the frequency of a specific TCR increased the mostfrom repertoire analysis before and after an antigen stimulation.Further, it is possible to select, as the optimal TCR, a candidateoptimal TCR which is artificially transgenically introduced into alymphocyte of a patient and exhibits the highest reactivity in actualcancer tissue of the patient as a typical example.

Such a cell processing therapeutic method using a T lymphocyteintroduced with a tumor specific TCR gene can be used in patients with awide range of cancer including, but not limited to, for exampleadrenocortical carcinoma, anal cancer, bile duct cancer, bladder cancer,breast cancer, cervical cancer, chronic lymphocytic leukemia, chronicmyelogenous leukemia, colorectal cancer, endometrial cancer, esophagealcancer, Ewing tumor, gallbladder cancer, Hodgkin's disease,hypopharyngeal cancer, laryngeal cancer, lip and oral cavity cancer,liver cancer, non-small-cell lung cancer, non-Hodgkin's lymphoma,melanoma, mesothelioma, multiple myeloma, ovarian cancer, pancreaticcancer, prostate cancer, gastric cancer, testicular cancer, thyroidcancer and the like.

Thus, in one embodiment, the antigen stimulation of the method of thepresent invention is applied with the antigen peptide or antigen proteinderived from the subject.

In another embodiment, the antigen stimulation of the method of thepresent invention is applied with the inactivated cancer cell derivedfrom the subject.

In another embodiment, the antigen stimulation of the method of thepresent invention is applied with the idiotype peptide derived fromtumor.

In another embodiment, the step C) of the present invention comprisesselecting an antigen that is highly expressed in cancer tissue of thesubject.

In another embodiment, the step C) of the present invention comprisesselecting an antigen which most strongly activates a T cell in anantigen specific lymphocyte stimulation test.

In another embodiment, the step C) of the present invention comprisesselecting an antigen that increases a frequency of a specific TCR themost from repertoire analysis conducted based on the repertoire analysismethod of the present invention or the repertoire analysis system of thepresent invention before and after applying the antigen stimulation.

In one specific embodiment, the present invention provides a method ofassessing efficacy and/or safety by a stimulation test in vitro by usinga cancer specific TCR gene isolated by <Isolation of tailor-made cancerspecific T cell receptor gene, isolation of cancer specific TCR gene bysearching for a common sequence the present invention>.

Efficacy can be assessed, for example, by culturing an antigen proteinor antigen peptide derived from a subject who received an antigenstimulation with an antigen protein or antigen peptide derived from thesubject and a T cell introduced with a cancer specific TCR gene, aninactivated cancer cell derived from the subject who has received anantigen stimulation with an inactivated cancer cell derived from thesubject, and an idiotype peptide derived from tumor which has receivedan antigen stimulation with an idiotype peptide derived from tumor, andthen measuring the amount of cytokines (interferon γ or the like)secreted out of a cell in response to T cell activation, measuring theamount of expression of a specific gene that is elevated in response toT cell activation, or measuring a cell surface molecule that isexpressed or increases expression in response to T cell activation.

<Safety> When a T cell derived from the subject introduced with a cancerspecific TCR gene is mixed with a normal cell derived from the subject,safety can be assessed, for example, by measuring secreted cytokines,gene expression, or expression of a cell surface molecule in response toactivation of the above-described T and verifying that the TCR geneintroduced T cell is not activated by a normal cell.

In one embodiment, the specific steps of efficacy and/or safetyassessment can be materialized as follows. For example: (1) a retroviralgene expression system is used to create a tumor specific TCRα and TCRβgene introduced T lymphocyte cell; (2) when assessing efficacy, a cancercell derived from a patient is extracted, separated, and immortalized,and then subjected to mixing and culturing with a T lymphocyteintroduced with a tumor specific TCR gene; (3) reactivity to a tumorcell can be quantitatively assessed to select a TCR gene reacting morestrongly to a tumor cell by using the above-described culture cell andperforming a cell proliferation test (thymidine uptake test, MTT test,IL-2 production test or the like); (4) when assessing safety, a control,which is an existing cell line, normal tissue free of patient's cancercells (part of the normal tissue collected in the process of extractingtumor), or patient's peripheral blood cells in the case where solidtumor is used and immortalized, and then subjected to mixing andculturing with a T lymphocyte introduced with a tumor specific TCR gene;and (5) reactivity to a tumor cell can be quantitatively assessed toselect a TCR gene that exhibits no reactivity to a normal cell by usingthe above-described culture cell and performing a cell proliferationtest (thymidine uptake test, MTT test, IL-2 production test or thelike).

Thus, in another aspect, the present invention provides a cellprocessing therapeutic method, comprising: A) collecting a T lymphocytefrom a patient; B) analyzing TCRs based on the repertoire analysismethod or the repertoire analysis system of the present invention afterapplying antigen stimulation to the T lymphocyte, wherein the antigenstimulation is applied by an antigen peptide or antigen protein derivedfrom the subject, an inactivated cancer cell derived from the subject,or an idiotype peptide derived from tumor; C) selecting an optimal TCRand an optimal antigen in the analyzed TCRs; D) producing a tumorspecific α and β TCR expression viral vector of a TCR gene of theoptimal TCR; and E) introducing the T lymphocyte introduced with a tumorspecific TCR gene into the patient.

A method of implementing the steps of introducing a resulting Tlymphocyte introduced with a tumor specific TCR gene into the patientcomprises the following: A) manufacturing a T lymphocyte introduced withthe tumor specific TCR gene; B) confirming expression of tumor specificTCRα and TCRβ; and C) intravenously introducing the T lymphocyteintroduced with a tumor specific TCR gene by intravenous drip.

Thus, in one embodiment, the antigen stimulation in the cell processingtherapeutic method of the present invention is applied with the antigenpeptide or antigen protein derived from the subject.

In another embodiment, the antigen stimulation in the cell processingtherapeutic method of the present invention is applied with theinactivated cancer cell derived from the subject.

In another embodiment, the antigen stimulation in the cell processingtherapeutic method of the present invention is applied with the idiotypepeptide derived from tumor.

In another embodiment, the step C) in the cell processing therapeuticmethod of the present invention comprises selecting an antigen that ishighly expressed in cancer tissue of the subject.

In another embodiment, the step C) in the cell processing therapeuticmethod of the present invention comprises selecting an antigen whichmost strongly activates a T cell in an antigen specific lymphocytestimulation test.

In another embodiment, the step C) in the cell processing therapeuticmethod of the present invention comprises selecting an antigen thatincreases a frequency of a specific TCR the most from repertoireanalysis conducted based on the repertoire analysis method of thepresent invention before and after applying the antigen stimulation.

<Isolation of Human Form Antibody Utilizing BCR Repertoire Analysis>

As one embodiment, the repertoire analysis method of the presentinvention can be used to perform BCR gene repertoire analysis to quicklyobtain a human form antibody specific to a target antigen by the methodsdescribed below. (A) a method of immunizing a mouse with a targetantigen protein or antigen peptide and separating a cell population(e.g., spleen, lymph node, or peripheral blood cells) comprising anantibody producing B cell from the mouse to analyze immunoglobulin heavychain and light chain genes by the repertoire analysis method of thepresent invention

(A1) the method of A, wherein the immunized mouse is a KM mouse capableof producing a complete human antibody while maintaining antibodydiversity(A2) the method of A, wherein the immunized mouse is a humanized mousecreated by transplanting a human stem cell into an NOG (NOD/Shi-scid,IL-2Rγnull) mouse exhibiting severe combined immunodeficiency made bymating an IL-2 receptor γ chain knockout mouse with a NOD/scid mouse(B) comparing immunoglobulin heavy chain and light chain geneticsequences obtained from samples derived from a control mouse and animmunized mouse or mice before and after antigen immunization andfrequencies thereof(C) identifying immunoglobulin heavy chain and light chain genes thatare strongly expressed or increase after immunization in the immunizedmouse(D) a method of selecting immunoglobulin heavy chain and light chaingenes selected from step C and inserting the genes to match one type ofantibody expression vector or inserting the genes separately into twotypes of antibody expression vectors(E) introducing the immunoglobulin heavy chain and light chain geneexpression vector made in step D into a eukaryotic cell such as CHO(Chinese Hamster Ovary) and culturing the cell(F) separating/purifying an antibody molecule produced or secreted by agenetically modified cell to inspect specificity to a target antibodyprotein or peptide.The above-described steps A-F are methods of directly and quicklyobtaining an antigen specific human form antibody without altering anantibody gene derived from an animal after obtainment thereof into achimeric antibody or humanized antibody of a human antibody. The methodscan be used in the development and manufacture of an antibody medicineconsisting of a human form antibody.

For KM mice used in this embodiment, the following can be referred:Ishida I, Tomizuka K, Yoshida H, Tahara T, Takahashi N, Ohguma A, TanakaS, Umehashi M, Maeda H, Nozaki C, Halk E, Lonberg N. Production of humanmonoclonal and polyclonal antibodies in TransChromo animals. CloningStem Cells. 2002; 4(1): 91-102. Review. For NOG mice, the following canbe referred: Ito M, Hiramatsu H, Kobayashi K, Suzue K, Kawahata M, HiokiK, Ueyama Y, Koyanagi Y, Sugamura K, Tsuji K, Heike T, Nakahata T.NOD/SCID/gamma(c) (null) mouse: an excellent recipient mouse model forengraftment of human cells. Blood. 2002 Nov. 1; 100(9): 3175-82. For CHOcells/antibody production, the following can be referred: Jayapal K P,Wlaschin K F, Hu W-S, Yap MGS. Recombinant protein therapeutics from CHOcells-20 years and counting. Chem Eng Prog. 2007; 103: 40?47.; ChusainowJ, Yang Y S, Yeo J H, Toh P C, Asvadi P, Wong N S, Yap M G. A study ofmonoclonal antibody-producing CHO cell lines: what makes a stable highproducer? Biotechnol Bioeng. 2009 Mar. 1; 102(4): 1182-96.

<Isolation of Human Form Antibody Utilizing BCR Repertoire Analysis>

As one embodiment, the BCR gene repertoire analysis method can beutilized to quickly obtain a human form antibody specific to a targetantigen by the methods described below.

(A) a method of immunizing a mouse with a target antigen protein or anantigen peptide and separating a cell population (e.g., spleen, lymphnode, or peripheral blood cells) comprising an antibody producing B cellfrom the mouse to analyze immunoglobulin heavy chain and light chaingenes by a BCR repertoire analysis method(A1) the method of A, wherein the immunized mouse is a KM mouse capableof producing a complete human antibody while maintaining antibodydiversity(A2) the method of A, wherein the immunized mouse is a humanized mousecreated by transplanting a human stem cell into an NOG (NOD/Shi-scid,IL-2Rynull) mouse exhibiting severe combined immunodeficiency made bymating an IL-2 receptor γ chain knockout mouse with an NOD/scid mouse(B) comparing immunoglobulin heavy chain and light chain geneticsequences obtained from samples derived from a control mouse and animmunized mouse or mice before and after antigen immunization andfrequencies thereof(C) identifying immunoglobulin heavy chain and light chain genes thatare strongly expressed or increase after immunization in the immunizedmouse(D) a method of selecting immunoglobulin heavy chain and light chaingenes selected from step C and inserting the genes to match one type ofantibody expression vector or inserting the genes separately into twotypes of antibody expression vectors(E) introducing the immunoglobulin heavy chain and light chain geneexpression vector made in step D into a eukaryotic cell such as CHO(Chinese Hamster Ovary) and culturing the cell(F) separating/purifying an antibody molecule produced or secreted by agenetically modified cell to inspect specificity to a target antibodyprotein or peptide.The above-described steps A-F are methods of directly and quicklyobtaining an antigen specific human form antibody without altering anantibody gene derived from an animal after obtainment thereof into achimeric antibody or humanized antibody of a human antibody. The methodscan be used in the development and manufacture of an antibody medicineconsisting of a human form antibody.

Embodiments of such methods include the following. As one examplethereof,

1. A KM mouse is immunized with a Myelin Oligodendrocyte Glycoprotein(MOG35-55, MOG), which is an antigen peptide of experimental autoimmuneencephalomyelitis. The same quantity of 2 mg/mL MOG peptide and completeFreund's adjuvant are mixed to create an emulsion. The mouse issubcutaneously immunized with 200 μg of MOG and simultaneously immunizedin the peritoneal cavity with 400 ng of pertussis toxin. A control mouseis immunized with PBS and complete Freund's adjuvant.2. On day 2 after the first immunization, the mouse is immunized with400 ng of pertussis toxin. After confirming an outbreak on day 10 afterthe immunization, the spleen is extracted from the mouse with an episodeof encephalomyelitis.3. The spleens of the outbreak mouse and control mouse are used to carryout next generation BCR repertoire analysis. Frequencies of appearanceof individual BCR sequences are counted and ranked for immunoglobulinheavy chain and immunoglobulin light chains.4. BCR sequences with a large increase in the frequency of appearance inthe outbreak mouse relative to the control mouse are extracted andranked. A combination of high ranking BCR sequences induced by theantibody administration is identified as a MOG specific antibody gene.5. A full length human immunoglobulin sequence is cloned by PCR-cloningfrom a BCR gene amplicon amplified from the outbreak mouse. Each of theIgG immunoglobulin heavy chain and the immunoglobulin light chain iscloned in an antibody expression vector. There is a method of insertingthe genes to match one type of antibody expression vector or insertingthe genes separately into two types of antibody expression vectors.6. A CHO (Chinese Hamster Ovary) cell is transformed by usingLipofectamine 3000 (Life Science) and IgG immunoglobulin heavy chain andimmunoglobulin light chain are introduced with the constructedexpression vector.7. A CHO cell culture solution is collected. Secreted antibody proteinsare collected by purification with a protein A affinity column andconcentration with gel filtration.8. binding activity to M0G35-55 or MOG protein is measured by an ELISAassay using the collected antibody to investigate the specificity of theantibody.9. When sufficient specificity is obtained, a cell line stablyexpressing an antibody is acquired and a human form anti-MOG antibody ismanufactured with a large-scale culturing system.

(Peptide and Therapy of the Present Invention)

The peptide of the present invention or a nucleic acid encoding the samecan be used in immunotherapy. Description thereof is provided below.

A peptide provided by the present invention is derived from an antigenassociated with tumorigenesis and can have the ability to bindsufficiently to an MHC (HLA) class II molecule to trigger an immuneresponse in a human, especially a lymphocyte, especially a T lymphocyte,especially a CD4 positive T lymphocyte, and especially a TH1 type immuneresponse induced by a CD4 positive T lymphocyte.

As used herein, “protein”, “polypeptide”, “oligopeptide” and “peptide”are used to have the same meaning and refer to an amino acid polymer ofany length. Such a polymer may be a branched or straight chain orannular. An amino may be a natural or non-natural or altered amino acid.The term may also encompass those assembled into a complex of aplurality of polypeptide chains. The term also encompasses natural orartificially altered amino acid polymers. Examples of such an alterationinclude disulfide bond formation, glycosylation, lipidation,acetylation, phosphorylation or any other manipulation or alteration(e.g., conjugation with a labeling component). The definition alsoencompasses, for example, a polypeptide comprising one or more analogsof an amino acid (e.g., including a non-natural amino acid, etc.),peptide-like compound (e.g., peptoid), and other known alterations inthe art. As used herein, “amino acid” may be natural or non-natural aslong as the objective of the present invention is met.

As used herein, “polynucleotide”, “oligonucleotide” and “nucleic acid”are used in the same meaning and refer to a nucleotide polymer with anylength. The term also encompasses “oligonucleotide derivative” and“polynucleotide derivative”. “Oligonucleotide derivative” or“polynucleotide derivative” refers to an oligonucleotide or apolynucleotide, which has a bond between nucleotides that is not normalor includes a derivative of a nucleotide. They are interchangeably used.Specific examples of such an oligonucleotide include2′-O-methyl-ribonucleotide, oligonucleotide derivative with aphosphodiester bond in an oligonucleotide converted to aphosphorothioate bond, oligonucleotide derivative with a phosphodiesterbond in an oligonucleotide converted to an N3′-P5′ phosphoroamidatebond, oligonucleotide derivative with a ribose and phosphodiester bondin an oligonucleotide converted to a peptide nucleic acid bond,oligonucleotide derivative with uracil in an oligonucleotide substitutedwith C-5 propynyl uracil, oligonucleotide derivative with uracil in anoligonucleotide substituted with C-5 thiazole uracil, oligonucleotidederivative with cytosine in an oligonucleotide substituted with C-5propynyl cytosine, oligonucleotide derivative with cytosine in anoligonucleotide substituted with phenoxazine-modified cytosine,oligonucleotide derivative with ribose in a DNA substituted with2′-O-propylribose, oligonucleotide derivative with ribose in anoligonucleotide substituted with 2′-methoxyethoxyribose and the like.Unless specifically noted otherwise, a specific nucleic acid sequence isfurther intended to encompass conservatively altered variants (e.g.,degenerate codon substituted form) and complementary sequences thereofin addition to the explicitly shown sequences. Specifically, adegenerate codon substituted form can be obtained by creating a sequencein which the third position of one or more selected (or all) codons issubstituted with a mixed base and/or deoxyinosine residue (Batzer etal., Nucleic Acid Res. 19: 5081 (1991); Ohtsuka et al., J. Biol. Chem.260: 2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8:91-(1994)). As used herein, “nucleic acid” is interchangeably used withgene, cDNA, mRNA, oligonucleotide, and polynucleotide. As used herein,“nucleotide” may be natural or non-natural.

As used herein, “gene” refers to an agent defining a genotype. A “gene”may refer to a “polynucleotide”, “oligonucleotide” or “nucleic acid”.

In addition to the identified peptides, a variant thereof may also beused in the present invention. Examples of such a variant include, butare not limited to, those that are homologous to the identified peptide.

As used herein, “homology” of genes refers to the level of identity of 2or more genetic sequences to one another. In general, having “homology”refers to having a high level of identity or similarity. Thus, a higherlevel of homology of two genes results in a higher level of identity orsimilarity of sequences thereof. It is possible to examine whether twotypes of genes are homologous by direct comparison of sequences or byhybridization under stringent conditions for nucleic acids. Whendirectly comparing two genetic sequences, the genes are homologoustypically when DNA sequences between the genetic sequences are at least50% identical, preferably at least 70% identical, and more preferably atleast 80%, 90%, 95%, 96%, 97%, 98% or 99% identical. Thus, as usedherein, “homolog” or “homologous gene product” refers to a protein inanother species, preferably a mammal, which exerts the same biologicalfunction as a protein constituent element of a complex further describedherein. Such a homolog is also called “ortholog gene product”. It isunderstood that such a homolog, homolog gene product, ortholog geneproduct and the like can also be used as long as they align with theobjective of the present invention.

An amino acid may be mentioned herein by a commonly known three lettersymbol thereof or a one letter symbol recommended by IUPAC-IUBBiochemical Nomenclature Commission. Similarly, a nucleotide may bementioned by a commonly recognized one letter code. Herein, comparisonof similarity, identity and homology of amino acid sequences and basesequences is calculated by using a default parameter with a sequenceanalysis tool BLAST. For instance, identity can be searched by usingNCBI's BLAST 2.2.28 (published on 2013 Apr. 2). The value of identityherein generally refers to a value from using the above-described BLASTto align sequences under default conditions. However, when a highervalue is output by changing a parameter, the highest value is consideredthe value of identity. When identity is assessed in a plurality ofregions, the highest value thereamong is considered the value ofidentity. Similarity is a numerical value that uses a similar amino acidinto the calculation in addition to identity.

In one embodiment of the present invention, “several” may be, forexample, 10, 8, 6, 5, 4, 3 or 2, or a value less than any one of thevalue. It is known that a polypeptide with a deletion, addition,insertion or other amino acid substitutions of one to several amino acidresidues maintains its biological activity (Mark et al., Proc Natl AcadSci USA. 1984 September; 81(18): 5662-5666., Zoller et al., NucleicAcids Res. 1982 Oct. 25; 10(20): 6487-6500, Wang et al., Science. 1984Jun. 29; 224 (4656): 1431-1433.). An antibody with a deletion or thelike can be made, for example, by site-directed mutagenesis, randommutagenesis, biopanning using an antibody phage library or the like. Forexample, KOD-Plus-Mutagenesis Kit (TOYOBO CO., LTD.) can be used forsite-directed mutagenesis. An antibody with the same activity as thewild-type can be selected from mutant antibodies introduced with adeletion or the like by performing various characterizations such asFACS analysis or ELISA.

In one embodiment of the present invention, “90% or greater” may be, forexample, 90, 95, 96, 97, 98, 99 or 100% or greater or within the rangeof any of the two values. For the above-described “homology”, thepercentage of the number of homologous amino acids in two or a pluralityof amino acid sequences may be calculated in accordance with a knownmethod in the art. Before calculating the percentage, amino acidsequences in a group of amino acid sequences to be compared are aligned.A space is introduced in a portion of amino acid sequences whennecessary to maximize the percentage of the same amino acids. Analignment method, method of calculating the percentage, comparisonmethod, and computer programs associated therewith have been well knownin the art (e.g., BLAST, GENETYX and the like). As used herein,“homology” can represent a value measured by BLAST of NCBI unlessspecifically noted otherwise. Blastp can be used in the default settingfor an algorithm for comparing amino acid sequences with BLAST. Resultsof measurement are expressed in a numerical form as Positives orIdentities.

As used herein, “polynucleotide that hybridizes under stringentconditions” refers to a conventional, well-known condition in the art.Such a polynucleotide can be obtained by using colony hybridization,plaque hybridization, southern blot hybridization or the like whileusing a polynucleotide selected from the polynucleotides of the presentinvention as a probe. Specifically, such a polynucleotide refers to apolynucleotide which can be identified by using a filter withimmobilized DNA derived from a colony or a plaque for hybridization at65° C. in the presence of 0.7-1.0 M NaCl, and then using a 0.1 to 2-foldconcentration SSC (saline-sodium citrate) solution (composition of anSSC solution with 1-fold concentration is 150 mM sodium chloride and 15mM sodium citrate) to wash the filter under the condition of 65° C. For“stringent condition”, the following are examples of conditions that canbe used. (1) low ionic strength and a high temperature are used forwashing (e.g., 0.015 M sodium chloride/0.0015 M sodium citrate/0.1%sodium dodecyl sulfate at 50° C.), (2) a denaturing agent such asformamide is used in hybridization (e.g., 50% (v/v) formamide, 0.1%bovine serum albumin/0.1% ficoll/0.1% polyvinyl pyrrolidone/50 mM sodiumphosphate buffer with a pH of 6.5, 750 mM sodium chloride, and 75 mMsodium citrate at 42° C.), or (3) a solution comprising 20% formamide,5×SSC, 50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10%dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, isincubated overnight at 37° C. and then a filter is washed with 1×SSC atabout 37-50° C. The formamide concentration may be 50% or greater.Washing time may be 5, 15, 30, 60, 120 minutes, or greater. A pluralityof elements are considered to affect stringency in a hybridizationreaction such as temperature, salt concentration and the like. Ausubelet al., Current Protocols in Molecular Biology, Wiley IntersciencePublishers, (1995) can be referred for details. “Highly stringentcondition”, for example, is 0.0015 M sodium chloride, 0.0015 M sodiumcitrate, and 65-68° C. or 0.015 M sodium chloride, 0.0015 M sodiumcitrate, 50% formamide and 42° C. Hybridization is performed inaccordance with the method described in experimental publications suchas Molecular Cloning 2^(nd) ed., Current Protocols in Molecular Biology,Supplement 1-38, DNA Cloning 1: Core Techniques, A Practical Approach,Second Edition, Oxford University Press (1995). In this regard, asequence comprising only an A sequence or only a T sequence ispreferably excluded from a sequence that hybridizes under stringentconditions. A moderately stringent condition can be readily determinedby those skilled in the art based on, for example, the length of a DNAand is shown in Sambrook et al., Molecular Cloning: A Laboratory Manual,Third Ed., Vol.′, 7.42-7.45 Cold Spring Harbor Laboratory Press, 2001,including, for a nitrocellulose filters, hybridization conditions of apre-wash solution of 1.0 mM EDTA (pH 8.0), 5×SSC, 0.5% SDS, and about50% formamide and 2×SSC—6×SSC at about 40-50° C. (or other similarhybridization solutions such as a Stark's solution in about 50%formamide at about 42° C.) and washing conditions of 0.5×SSC, 0.1% SDSat about 60° C. Thus, the polypeptides used in the present inventionencompass polypeptides encoded by a nucleic acid molecule thathybridizes under highly or moderately stringent conditions to a nucleicacid molecule encoding a polypeptide particularly described in thepresent invention.

As used herein, “purified” substance or biological agent (e.g., nucleicacid, protein or the like) refers to a substance or biological agenthaving at least some of the agents that are naturally accompaniedtherewith removed. Thus, purity of a biological agent in a purifiedbiological agent is higher than that of a normal condition of thebiological agent (i.e., concentrated). The term “purified” as usedherein preferably refers to the presence of at least 75 wt. %, morepreferably at least 85 wt. %, still more preferably at least 95 wt. %,and the most preferably at least 98 wt. % of biological agents of thesame type. A substance or biological agent used in the present inventionis preferably a “purified” substance. An “isolated” substance orbiological agent (e.g., nucleic acid, protein, or the like) as usedherein refers to a substance or biological agent having agents that arenaturally accompanied therewith substantially removed. The term“isolated” as used herein varies depending on the objective. Thus, theterm does not necessarily need to be represented by purity. However,when necessarily, the term refers to preferably the presence of at least75 wt. %, more preferably at least 85 wt. %, still more preferably atleast wt. % and the most preferably at least 98 wt. % of biologicalagents of the same type. A substance used in the present invention ispreferably an “isolated” substance or biological agent.

As used herein, “fragment” refers to a polypeptide or polynucleotidewith a sequence length of 1 to n−1 with respect to the full lengthpolypeptide or polynucleotide (with length n). The length of a fragmentcan be appropriately changed in accordance with the objective. Examplesof the lower limit of such a length include 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 30, 40, 50 and more amino acids for a polypeptide. Lengthsrepresented by an integer that is not specifically listed herein (e.g.,11 and the like) also can be suitable as a lower limit. Further,examples of length include 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50,75, 100, and more nucleotides for a polynucleotide. Lengths representedby an integer that is not specifically listed herein (e.g., 11 and thelike) also can be suitable as a lower limit. As used herein, such afragment is understood to be within the scope of the present invention,for example, when a full length version functions as a marker or atarget molecule, as along as the fragment itself also functions as amarker or a target molecule.

As used herein, “functional equivalent” refers to any entity having thesame function of interest but a different structure relative to theoriginal target entity. A functional equivalent can be found bysearching a database or the like. As used herein, “search” refers toutilizing a certain nucleic acid base sequence electronically,biologically, or by another method to find another nucleic acid basesequence having a specific function and/or property. Examples ofelectronic search include, but are not limited to, BLAST (Altschul etal., J. Mol. Biol. 215:403-410 (1990)), FASTA (Pearson & Lipman, Proc.Natl. Acad. Sci., USA 85: 2444-2448 (1988)), Smith and Waterman method(Smith and Waterman, J. Mol. Biol. 147: 195-197 (1981)), Needleman andWunsch method (Needleman and Wunsch, J. Mol. Biol. 48: 443-453 (1970))and the like. Examples of biological search include, but are not limitedto, stringent hybridization, a macroarray with a genomic DNA applied toa nylon membrane or the like or a microarray with a genomic DNA appliedto a glass plate (microarray assay), PCR, in situ hybridization and thelike. Herein, a gene used in the present invention is intended toinclude corresponding genes identified by such electronic search orbiological search.

As a functional equivalent of the present invention, it is possible touse an amino acid sequence with one or more amino acid insertions,substitutions or deletions, or addition to one or both ends. As usedherein, “one or more amino acid insertions, substitutions or deletions,or addition to one or both ends” in an amino acid sequence refers to analteration with a substitution of a plurality of amino acids or the liketo the extent that can occur naturally by a well-known technical methodsuch as site-directed mutagenesis or natural mutation. An altered aminoacid sequence of a molecule can have, for example, 1-30, preferably1-20, more preferably 1-9, still more preferably 1-5, and especiallypreferably 1-2 amino acid insertion, substitution or deletion oraddition to one or both ends. An altered amino acid sequence may be anamino acid sequence having one or more (preferably 1 or several, or 1,2, 3 or 4) conservative substitutions in an amino acid sequence such asCCL21, CXCR3, or CCR7. “Conservative substitution” refers herein to asubstitution of one or more amino acid residues with another chemicallysimilar amino acid residue so as not to substantially alter a functionof a protein. Examples thereof include cases where a hydrophobic residueis substituted with another hydrophobic residue, cases where a polarresidue is substituted with another polar residue having the same chargeand the like. Functionally similar amino acids that can be substitutedin this manner are known in the art for each amino acid. Specificexamples include alanine, valine, isoleucine, leucine, proline,tryptophan, phenylalanine, methionine and the like for nonpolar(hydrophobic) amino acids, glycine, serine, threonine, tyrosine,glutamine, asparagine, cysteine and the like for polar (neutral) aminoacids. Examples of positively charged (basic) amino acid includearginine, histidine, lysine and the like. Further, examples of anegatively-charged (acidic) amino acid include aspartic acid, glutamicacid and the like.

As used herein, “subject” refers to a target subjected to diagnosis,detection or the like of the present invention.

As used herein, “agent” is used broadly and may be any substance orother elements (e.g., energy, radiation, heat, electricity and otherforms of energy) as long as the intended objective can be achieved.Examples of such a substance include, but are not limited to, protein,polypeptide, oligopeptide, peptide, polynucleotide, oligonucleotide,nucleotide, nucleic acid (including for example DNAs such as cDNA andgenomic DNA, RNAs such as mRNA), polysaccharide, oligosaccharide, lipid,organic small molecule (e.g., hormone, ligand, information transmittingsubstance, organic small molecule, molecule synthesized by combinatorialchemistry, small molecule that can be used as medicine (e.g., smallmolecule ligand and the like) and a composite molecule thereof). Typicalexamples of an agent specific to a polynucleotide include, but are notlimited to, a polynucleotide having complementarity with a certainsequence homology (e.g., 70% or greater sequence identity) to a sequenceof the polynucleotide, polypeptide such as a transcription factor thatbinds to a promoter region and the like. Typical examples of an agentspecific to a polypeptide include, but are not limited to, an antibodydirected specifically to the polypeptide or a derivative or analogthereof (e.g., single strand antibody), a specific ligand or receptorwhen the polypeptide is a receptor or ligand, a substrate when thepolypeptide is an enzyme and the like.

As used herein, “therapy” refers to the prevention of amelioration,preferably maintaining the current condition, more preferablyalleviation, and still more preferably disappearance of a disease ordisorder (e.g., cerebral malaria) in case of such a condition, includingbeing able to exert a prophylactic effect or an effect of improving oneor more symptoms accompanying the disease. Preliminary diagnosis withsuitable therapy may be referred to as “companion therapy” and adiagnostic agent therefor as “companion diagnostic agent”.

As used herein, “therapeutic agent” broadly refers to all agents capableof treating a condition of interest (e.g., diseases such as cerebralmalaria or the like) and refers to an inhibiter (e.g., antibody) such asthose provided by the present invention. In one embodiment of thepresent invention, “therapeutic agent” may be a pharmaceuticalcomposition comprising an effective ingredient and one or morepharmacologically acceptable carriers. A pharmaceutical composition canbe manufactured, for example, by mixing an effective ingredient and theabove-described carriers by any method known in the technical field ofpharmaceuticals. Further, usage form of a therapeutic agent is notlimited as long as it is used for therapy. A therapeutic agent may be aneffective ingredient alone or a mixture of an effective ingredient andany ingredient. Further, the shape of the above-described carriers isnot particularly limited. For example, the carrier may be a solid orliquid (e.g., buffer solution).

As used herein, “prevention” refers to the action of taking a measureagainst a disease or disorder (e.g., cerebral malaria) from being insuch a condition prior to being in such a condition. For example, it ispossible to use the agent of the present invention to perform diagnosis,and optionally use the agent of the present invention to prevent or takemeasures to prevent cerebral malaria or the like

As used herein, “prophylactic agent” broadly refers to all agentscapable of preventing a condition of interest (e.g., diseases such ascerebral malaria or the like).

The present invention provides a peptide derived from an antigenassociated with tumorigenesis and have the ability to bind sufficientlyto an MHC (HLA) class I molecule to trigger an immune response of ahuman leukocyte, especially a lymphocyte, especially a T lymphocyte, andespecially a CD8 positive cytotoxic T lymphocyte, as well ascombinations of two peptides especially useful for a vaccination of acancer patient.

The peptide of the present invention may be derived from a tumorassociated antigen, especially, for example, tumor associated antigenswith a function in proteolysis, angiogenesis, cell growth, cell cycleregulation, cell division, transcriptional regulation, tissueinfiltration or the like.

A peptide can be chemically synthesized and is usable as an effectivepharmaceutical ingredient in manufacture of medicine. Thus, the peptideprovided by the present invention can be used in immunotherapy orpreferably cancer immunotherapy.

The pharmaceutical composition of the present invention furthercomprises an additional peptide and/or an excipient to increase theeffect. This is further explained below.

The pharmaceutical composition of the present invention can comprise apeptide identified in the present invention, the peptide having a fulllength of 8-100 amino acids, preferably 8-30 amino acids, and mostpreferably 8-16 amino acids.

In addition, the peptide or variant can be further modified to improvethe stability and/or binding to an MHC molecule in order to induce amore potent immune response. Methods of optimizing such a peptidesequence are well known to those skilled in the art, including, forexample, introduction of a non-peptide bond or reversed peptide bond.Thus, another embodiment of the present invention provides apharmaceutical composition wherein at least one peptide or a variantthereof comprises a non-peptide bond.

An amino acid residue in a reversed peptide bond is not bound by apeptide (—CO—NH—) where the peptide bond is reversed. Such aretro-inverso peptidomimetic can be made by using a method well known tothose skilled in the art. Examples thereof include the method describedin Meziere et al (1997) J. Immunol. 159, 3230-3237 incorporated hereinas a reference. This approach involves creating a pseudo peptidecomprising a change involving the backbone but not involving orientationof a side chain. Meziere et al (1997) show that such pseudo peptides areuseful in an MHC and T helper cell response. A retro-inverso peptidecomprising an NH—CO bond instead of a CO—NH peptide bond has muchstronger resistance to proteolysis.

A non peptide bond is, for example, —CH₂—NH, —CH₂S—, —CH₂CH₂—, —CH═CH—,—COCH₂—, —CH(OH)CH₂—, or —CH₂SO—. U.S. Pat. No. 4,897,445 provides amethod of solid phase synthesis of a non-peptide bond (—CH₂—NH) in apolypeptide chain, which involves a non-peptide bond synthesized byreacting amino aldehyde and amino acid in the presence of NaCNBH₃, andpolypeptide synthesized by a standard procedure.

A peptide having a sequence of the present invention can be synthesizedwith an additional chemical group at the amino end and/or carboxy endthereof in order to strengthen, for example, the stability,bioavailability and/or affinity of the peptide. For example, it ispossible to add a hydrophobic group such as a t-butyloxylcarbonyl group,dansyl group, or carbobenzoxy group at the amino end of the peptide.Similarly, an acetyl group or 9-fluorenylmethoxy-carbonyl group can beplaced at the amino end of the peptide. In addition, it is possible toadd, for example, the hydrophobic group, i.e., t-butyloxylcarbonyl groupor amino group to the carboxy end of the peptide.

Furthermore, the peptide used in the present invention can besynthesized to change the steric configuration thereof. For example, aD-isomer of one or more amino acid residues of the peptide can be usedinstead of a common L-isomer. Furthermore, at least one amino acidresidue of the peptide of the present invention can be substituted witha well-known non-natural amino acid residue. Such alteration can act toincrease stability, bioavailability and/or binding action of the peptideof the present invention.

Similarly, the peptide or a variant of the present invention can bechemically modified by a reaction with a specific amino acid before orafter synthesis of the peptide used in the present invention. Examplesof such modification are well known in the art. Examples thereof aresummarized in R. Lundblad, Chemical Reagents for Protein Modification,3^(rd) ed. CRC Press, 2005 incorporated herein by reference. Examples ofchemical modification of an amino acid include, but are not limited to,modifications by acylation, amidination, pyridoxylation of lysine,reductive alkylation, trinitrobenzylation of an amino group by2,4,6-trinitrobenzenesulfonic acid (TNBS), sulfhydryl modification byperformic acid and amide modification of a carboxyl group, oxidationfrom cysteine to cystic acid, generation of mercury derivatives,generation of mixed disulfide with another thiol compound, reaction withmaleimide, carboxymethylation with iodoacetic acid or iodoacetamide, andcarbamoylation with cyanate at an alkaline pH. With regard to the above,those skilled in the art can reference a broader methodology related tochemical modification of a protein with Current Protocols In ProteinScience, Eds. Coligan et al. (John Wiley& Sons NY 1995-2000). Forexample, modification of an arginine residue of a protein is often theformation of an additive based on a reaction of adjacent dicarbonylcompounds such as 1,2-cyclohexanedione, 2,3-butanedione, andphenylglyoxal. Another example is a reaction of an arginine residue withmethylglyoxal. Cysteine can be modified without simultaneousmodification of another nucleophilic site such as lysine or histidine.For this reason, numerous reagents can be utilized in cysteinemodification. Information on specific reagents is provided by PierceChemical Company, Sigma-Aldrich, and other websites.

A disulfide bond in a protein used in the present invention is oftenselectively reduced. A disulfide bond can be formed and oxidized duringheat treatment of a biomedicine. A specific glutamic acid residue can bemodified by using Woodward's Reagent K. An intermolecular crosslink canbe formed between a lysine residue and a glutamic acid residue by usingN-(3-(dimethylamino)propyl)-N′-ethylcarbodiimide. For example,diethylpyrocarbonate is a reagent for modifying a histidine residue in aprotein. Histidine can also be modified by using 4-hydroxy-2-nonenal. Areaction between a lysine residue and another a amino acid group isuseful, for example, in a bond between a peptide and a surface orprotein/peptide crosslink. Lysine is a site where poly(ethylene)glycolattaches and is a main site of modification in glycation of a protein. Amethionine residue of a protein can be modified, for example, byiodoacetamide, bromoethylamine, or chloramine-T. Tetranitromethane andN-acetylimidazole can be used in modification of a tyrosyl residue. Acrosslink by formation of dityrosine can be accomplished with hydrogenperoxide/copper ions. N-bromosuccinimide, 2-hydroxy-5-nitrobenzylbromide and 3-bromo-3-methyl-2-(2-nitrophenylmercapto)-3H-indole(BPNS-skatole) were used in a recent study related totryptophan modification. Suitable modification of a therapeutic proteinand peptide with PEG often involves prolonging the circulationhalf-life. In addition, protein crosslink by glutaraldehyde,polyethylene glycol diacrylate and formaldehyde is used for preparationof a hydrogel. Chemical modification of an allergen for immunotherapy isoften accomplished by carbamylation with potassium cyanate.

In general, the peptides and variants used in the present invention (atleast those comprising a peptide link between amino acid residues) canbe synthesized, for example, by using a Fmoc-polyamide form of solidphase peptide synthesis, as disclosed in Lu et al (1981) J. Org. Chem.46, 3433 and a reference thereof. Purification can be performed by acombination of one or more technologies such as recrystallization, sizeexclusion chromatography, ion exchange chromatography, hydrophobicinteraction chromatography, and reversed phase high-performance liquidchromatography that (generally) uses, for example, acetonitrile/watergradient separation. A peptide can be analyzed by using thin-layerchromatography, electrophoresis, especially caterpillar electrophoresis,solid phase extraction (CSPE), reversed phase high performance liquidchromatography, amino acid analysis after acid hydrolysis, fast atombombardment (FAB) mass spectrometry, MALDI and ESI-Q-TOF massspectrometry.

In still another aspect of the present invention, a nucleic acid (e.g.,polynucleotide) encoding the peptide of the present invention or avariant thereof is provided. For example, DNA, cDNA, PNA, CNA, RNA,single strand and/or double strand, or natural or stable form of apolynucleotide such as a polynucleotide having phosphorothioate backboneor a combination thereof can be such a polynucleotide. It is notessential to contain intron as long as a polynucleotide encodes thepeptide. Naturally, only peptides comprising a naturally occurring aminoacid residue bound by a naturally occurring peptide bond is encoded by apolynucleotide. In yet another embodiment of the present invention, anexpression vector with an ability to express the polypeptide accordingto the present invention is provided. Expression vectors of differentcell types are well known in the art and can be selected without anyspecial experimentation.

In general, a DNA is inserted into an expression vector such as aplasmid in a correct orientation with a correction reading frame forexpression. If necessary, a DNA can be linked to a suitabletranscription/translation regulating/managing nucleotide sequence whichis recognized by a desired host. However, such a management function isgenerally in an expression vector. The vector is then introduced into ahost by a standard technology. In regards to this, Sambrook et al (1989)Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y. can be referred.

A regimen with the optimal dose and optimal amount of the nucleicpeptide contained in a vaccine can be determined by those skilled in theart without any special experimentation. For example, the peptide or amutant form thereof can be prepared as an intravenous (i.v.) injection,subcutaneous (s.c.) injection, intradermal (i.d.) injection,intraperitoneal (i.p.) injection, or intramuscular (i.m.) injection.Preferred routes of administration for a peptide injection are s.c.,i.d., i.p., i.m., and i.v. Preferred routes of administration for a DNAinjection are i.d., i.m., s.c., i.p., and i.v. For example, 1-500 mg, 50μg-1.5 mg, preferably 125 μg-500 μg of peptide or DNA can beadministered. The dose is dependent on each peptide or DNA. The dose inthis range has been successfully used in clinical trials (Brunsvig P F,Aamdal S, Gjertsen M K, Kvalheim G, Markowski-Grimsrud C J, Sve I,Dyrhaug M, Trachsel S, Muller M, Eriksen J A, Gaudernack G; Telomerasepeptide vaccination: a phase I/II study in patients with non-small celllung cancer; Cancer ImmunolImmunother. 2006; 55(12): 1553-1564; M.Staehler, A. Stenzl, P. Y. Dietrich, T. Eisen, A. Haferkamp, J. Beck, A.Mayer, S. Walter, H. Singh, J. Frisch, C. G. Stief; An open label studyto evaluate the safety and immunogenicity of the peptide based cancervaccine IMA901, ASCO meeting 2007; Abstract No 3017).

The selection, number, and/or amount of peptides in the pharmaceuticalcomposition of the present invention can be made specific to tissue,cancer and/or patient in preparing the composition. For example, a sideeffect can be avoided by deriving a correction selection of peptide bythe expression pattern of a protein of tissue of a given patient. Theselection may be dependent on the cancer type and condition of thedisease specific to a patient receiving therapy, therapeutic regimen upto this point, immune status of the patient and naturally, the HLAhaplotype of the patient. Furthermore, the vaccine according to thepresent invention can comprise an individualized constituent elementdepending on individual needs of a specific patient. Examples thereofinclude expression of related TAA, personal side effects due to allergyor other therapy of the individual, and different amounts of peptide inaccordance with adjustments for secondary therapy after a series ofinitial therapeutic plans for a specific patient.

A peptide in which a parent protein is highly expressed in normal tissueis avoided or is present at a low amount in the composition of thepresent invention. Meanwhile, when the tumor of a patient is known tohighly express a specific protein, each pharmaceutical composition forthe treatment of the cancer can be present at a high amount and/orcomprise a plurality of peptides specific to the specific protein orroute. Those skilled in the art can select a preferred combination ofimmunogenic peptides by testing T cell functionality, expansion,affinity and proliferation of a specific T cell against a specificpeptide, overall presentation, in vitro T cell formation and efficacythereof by, for example, analysis of IFN-γ (also see the followingexamples). Generally, the most efficient peptides are then combined as avaccine for the aforementioned objective.

A suitable vaccine preferably contains 1-20 peptides, more preferably 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20different peptides, still more preferably 6, 7, 8, 9, 10, 11, 12, 13 or14 different peptides, and most preferably 14 different peptides. Thelength of the peptide used as a cancer vaccine may be any suitablepeptides. Specifically, the length can be a suitable 9-mer peptide or asuitable 7-mer or 8-mer or 10-mer or 11-mer peptide or a 12-mer, 13-mer,14-mer, or 15-mer peptide. A longer peptide is also suitable in somecases. As described in the appended Tables 1 and 2, a 9-mer or 10-merpeptide is preferred for an MHC class I peptide and a 12- to 15-mer ispreferred for an MHC class II peptide.

The peptide of the present invention constitutes a tumor or cancervaccine. The tumor or cancer vaccine can be administered directly to theorgan with the disease or systemically into a patient, administered tothe patient with a vaccine applied in vitro to a human cell line or acell from the patient, or used in vitro to select a subpopulation fromimmune cells of a patient and re-administered into the patient.

The peptide can be substantially pure, combined with animmunostimulatory adjuvant (see below), used in combination with animmunostimulatory cytokine, or coadministered with a suitable deliverysystem (e.g., liposome). The peptide can also be conjugated with asuitable carrier such as keyhole limpet hemocyanin (KLH) or mannan (seeWO 95/18145 and Longenecker et al (1993) Ann. NY Acad. Sci. 690,276-291). The peptide can also be tagged, or formed into a fusionprotein or a hybrid molecule. A peptide given the sequence in thepresent invention is expected to stimulate CD4 or CD8CTL. However,efficiency of stimulation is increased more by a positive T cellproviding assistance to the opposite CD. Thus, for an MHC class IIepitope stimulating CD4CTL, a section of a hybrid molecule or a fusionpartner thereof suitably provides an epitope stimulating a CD8 positiveT cell. Meanwhile, for an MHC class I epitope stimulating CD8CTL, asection of a hybrid molecule or a fusion partner thereof suitablyprovides an epitope stimulating a CD4 positive T cell. CD4 and CD8stimulating epitopes are well known in the art, including thosespecified in the present invention.

To elicit an immune response, it is generally necessary to include anexcipient for enhancing immunogenicity of the composition. Thus, apharmaceutical composition of a preferred embodiment of the presentinvention further has at least one suitable adjuvant. Since an adjuvantused in the present invention is a substance that non-specificallyenhances or promotes an immune response to an antigen (e.g., immuneresponse mediated by CTL and helper T (TH) cell), the adjuvant isunderstood to be useful for the agent of the present invention. Suitableadjuvants are, but not limited to, 1018 ISS, aluminum salt, Amplivac,AS15, BCG, CP-870, 893, CpG7909, CyaA, dSLIM, GM-CSF, IC30, IC31,imiquimod, ImuFact IMP321, IS Patch, ISS, ISCOMATRIX, Juvlmmune,LipoVac, MF59, monophosphoryl lipid A, Montanide IMS 1312, Montanide ISA206, Montanide ISA 50V, Montanide ISA-51, OK-432, OM-174, OM-197-MP-EC,ONTAK, PepTel(R) vector system, PLG microparticles, resiquimod, SRL172,virosome and other virus-like particles, YF-17D, VEGF trap, R848, β,glucan, Pam3Cys, Aquila'S QS21 Stimulon (Aquila Biotech, Worcester, MA,USA) (saponin derived substance), mycobacteria extract, and syntheticbacterial cell wall mimetics, and other exclusive adjuvants (Ribi'sDetox, Quil, Superfos or the like). Adjuvants such as Freund'sincomplete or GM-CSF are preferred. Several immunogenic adjuvantsspecific to dendritic cells (e.g., MF59) and formulations thereof havealready been described (Dupuis M, Murphy T J, Higgins D, Ugozzoli M, vanNest G, Ott G, McDonald D M; Dendritic cells internalize vaccineadjuvant after intramuscular injection; Cell Immunol. 1998; 186(1):18-27; Allison A C; The mode of action of immunological adjuvants; DevBiol Stand. 1998; 92: 3-11). Further, cytokines can also be used. Somecytokines are directly linked to an effect on migration of dendriticcells to lymphocyte tissue (e.g., TNF-α) as accelerating the process ofa dendritic cell maturing into an effective antigen-presenting cellagainst a T lymphocyte (e.g. GM-CSF, IL-1, IL-4) (U.S. Pat. No.5,849,589 (the entirety thereof is incorporated by reference)), or asacting as an immunoadjuvant (e.g., IL-12) (Gabrilovich D I, Cunningham HT, Carbone D P; IL-12 and mutant P53 peptide-pulsed dendritic cells forthe specific immunotherapy of cancer; J Immunother Emphasis TumorImmunol. 1996 (6): 414-418).

It is reported that a CpG immunostimulatory oligonucleotide alsoenhances the effect of an adjuvant in a vaccine setting. Although notwishing to be bound by any theory, a CpG oligonucleotide has the actionof activating an inherent (non-adaptive) immune system via a toll-likereceptor (TLR) (mainly TLR9). TLR9 activity triggered by CpG enhances ahumoral and cellular antigen specific response to various antigensincluding a peptide or protein antigen, live or killed virus, dendriticcell vaccine, autologous cell vaccine, and polysaccharide conjugate inboth a prophylactic vaccine and therapeutic vaccine. More significantly,this increases maturation and differentiation of a dendritic cell andincreases generation of cytotoxic T lymphocytes (CTL) and TH1 cellactivity without assistance of a CD4 T cell. TH1 bias induced by a TLR9stimulation is maintained, even in the presence of a vaccine adjuvantsuch as alum or Freund's incomplete adjuvant (IFA) that generallypromotes TH2 bias. A CpG oligonucleotide is coformulated orcoadministered with another adjuvant or is formed into a microparticle,nanoparticle, lipid emulsion or a similar formulation to exhibit ahigher adjuvant activity. A CpG oligonucleotide is especially requiredfor inducing a strong response in case of a relatively weak antigen.They also accelerate an immune response. In some experiments, anantibody response comparable with a full dose vaccine without CpG wasobtained with an antigen dose that has been reduced by about doubledigits (Arthur M. Krieg, Therapeutic potential of Toll-like receptor 9activation, NatureReviews, Drug Discovery, 2006, 5, 471-484). U.S. Pat.No. 6,406,705 B1 describes that an antigen-specific immune response iselicited by a combination of a CpG oligonucleotide, non-nucleic acidadjuvant, and an antigen. Other TLR bound molecules such as RNA boundTLR7, TLR8, and/or TLR9 can also be used.

Examples of other useful adjuvants in the present invention include, butare not limited to, chemically modified CpGs (e.g., CpR, Idera), poly(I:C) (e.g., polyI: C12U), non-CpG bacterial DNA or RNA,imidazoquinoline, cyclophosphamide, sunitinib, bevacizumab, Celebrex,NCX-4016, sildenafil, tadalafil, vardenafil, sorafenib, XL-999,CP-547632, pazopanib, ZD2171, AZD2171, ipilimumab, tremelimumab, andSC58175, which can act as a therapeutic agent and/or adjuvant. In thecontext of the present invention, the amount and concentration of auseful adjuvant and additive can be readily determined by those skilledin the art without any special experimentation. Preferred adjuvants aredSLIM, BCG, OK432, imiquimod, PeviTer, and JuvImmune. In a preferredembodiment of the pharmaceutical composition of the present invention,the adjuvant is selected from the group comprising colony stimulatingfactors such as granulocyte macrophage colony stimulating factors(GM-CSF, sargramostim). In a preferred embodiment of the pharmaceuticalcomposition of the present invention, the adjuvant is imiquimod.

The composition of the present invention is used in parenteraladministration such as subcutaneous, intradermal, or intramuscularadministration or oral administration. For this reason, the peptide andother selective molecules are dissolved or suspended in apharmaceutically acceptable, preferably water-soluble carrier. Inaddition, the composition can comprise an excipient such as buffer,binding agent, blasting agent, diluent, flavoring agent, or lubricant.Further, the peptide can be co-administered with an immunostimulatorysubstance such as a cytokine. An extensive list of excipients that canbe used in such a composition can be obtained from, for example, A.Kibbe, Handbook of Pharmaceutical Excipients, 3. Ed. 2000, AmericanPharmaceutical Association and pharmaceutical press. The composition canbe used in tumor or cancer, preferably in CRC prevention and/ortherapeutic method.

A cytotoxic T cell (CTL) recognizes a peptide shaped antigen bound to anMHC molecule, but not an original exogenous antigen itself. The MHCmolecule itself is on a cell surface of an antigen-presenting cell.Thus, CTL activation is only possible in the presence of APC, MHCmolecule, and trimer complex of a peptide antigen. Thus, CTL is notactivated by using only a peptide. An immune response is increased byfurther adding APC with each MHC molecule. Thus, in a preferredembodiment, the pharmaceutical composition of the present inventionadditionally comprises at least one antigen-presenting cell.

An antigen-presenting cell (or a stimulator cell) generally has an MHCclass I or II molecule on the surface thereof. In one embodiment, theMHC class I or II molecule having the selected antigen substantiallycannot be loaded onto itself. As discussed below in detail, the selectedantigen thereof can be readily loaded onto the MHC class I or IImolecule in vitro.

In general, the pharmaceutical composition of the present inventioncomprising the nucleic acid of the present invention can be administeredby the same method as a pharmaceutical composition comprising thepeptide of the present invention, i.e., by intravenous, intraarterial,peritoneal, intramuscular, intradermal, intratumoral, oral, transdermal,trans-nasal cavity, trans-oral cavity, transrectal, or transvaginaladministration, inhalation or topical administration.

Tumor often acquires resistance to a therapeutic agent by the mechanismof avoidance. The drug resistance occurs during therapy and in somecases appears as metastatic or recurrent tumor. To avoid such drugresistance, tumor therapy is generally administered by a combination ofdrugs. In many cases, a different combination is required for metastasisand tumor recurrence after an event-free period. Thus, in one embodimentof the present invention, the pharmaceutical composition is administeredwith a second anticancer agent. A second agent used in the presentinvention can be administered before, after or simultaneously with thepharmaceutical composition of the present invention. For example, ifchemical properties are compatible, simultaneous administration can beperformed by mixing the pharmaceutical composition of the presentinvention with the second anticancer agent. Another method ofsimultaneous administration is, for example, to inject thepharmaceutical composition of the present invention and orallyadminister a second anticancer agent to administer the composition andanticancer agent on the same day from independent routes ofadministration. The pharmaceutical composition and the second anticanceragent may also be administered through separate therapeutic coursesand/or administered through the same therapeutic course on differentdays.

In another aspect of the present invention, a method of treating orpreventing cancer in a patient is provided. The method has a step ofadministering any one of the pharmaceutical compositions of the presentinvention at a therapeutically effective amount to the patient. Atherapeutically effective amount is an amount sufficient to elicit animmune response, especially to activate a subpopulation of CTL. Thoseskilled in the art can readily determine an effective amount by using astandard immunological method as provided in the Examples of the presentspecification. Another method of monitoring the effect of a specificamount of the pharmaceutical composition of the present invention is toobserve the growth and/or recurrence of treated tumor.

In an especially preferred embodiment of the present invention, thepharmaceutical composition is used as an anticancer vaccine.

The composition comprising a peptide or a nucleic acid encoding thepeptide of the present invention can constitute a tumor or cancervaccine. The tumor or cancer vaccine can be administered directly to theorgan with the disease or systemically into a patient, administered tothe patient with a vaccine applied in vitro to a human cell line or acell from the patient, or used in vitro to select a subpopulation fromimmune cells of a patient and re-administered into the patient

The composition of the present invention can be used as a vaccine or asa method of treating cancer. The cancer is oral cavity or pharyngealcancer, gastrointestinal cancer, colon, rectal, or anal cancer, airwaycancer, breast cancer, uterine, vaginal, or vulvar cancer, endometrialor ovarian cancer, male genital tract cancer, urethral cancer, bone andsoft tissue cancer, Kaposi's sarcoma, skin melanoma, eye melanoma,nonmelanoma eye cancer, brain or central nervous system cancer, thyroidand other endocrine gland cancer, Hodgkin's lymphoma, non-Hodgkin'slymphoma, or myeloma, and preferably renal cancer, colorectal carcinoma,lung cancer, breast cancer, pancreatic cancer, prostate cancer, gastriccancer, brain cancer, GIST or glioblastoma. According to the presentinvention, the preferred amount of peptide can vary between about0.1-100 mg in 500 μl solution, preferably about 0.1-1 mg, and mostpreferably about 300 μg-800 μg. In this regard, the term “about” refersto +/−10 percentage of a given value unless specifically notedotherwise. Those skilled in the art would be able to adjust the actualamount of peptide to be used based on several factors such as the immunestatus of an individual patient and/or the amount of TUMAP presented ina specific type of cancer. The peptide of the present invention may beprovided in a suitable shape (sterilized solution or the like) otherthan a freeze-thaw peptide.

The pharmaceutical composition of the present invention having thepeptide and/or nucleic acid according to the present invention isadministered to a patient with adenoma or cancerous disease associatedwith each corresponding peptide or antigen. An immune response mediatedby a T cell is triggered thereby. Preferably, the amount of expressionvector in the pharmaceutical composition of the present invention, inthe peptide (especially peptide associated with tumor) or nucleic acidof the pharmaceutical composition of the present invention, or thecomposition of the present invention is specific to the tissue, cancer,and/or patient.

In another embodiment of the present invention, the vaccine of thepresent invention is a nucleic acid vaccine. It is well known that a Tcell response is elicited by inoculation of a nucleic acid vaccine suchas a DNA vaccine encoding a polypeptide. The tumor or cancer vaccine canbe administered directly to the organ with the disease or systemicallyinto the patient, administered to the patient with a vaccine applied invitro to a human cell line or a cell from the patient, or used in vitroto select a subpopulation from immune cells of a patient andre-administered into the patient. When the nucleic acid is administeredinto a cell in vitro, it is useful in some cases to introduce cells suchthat an immunostimulatory cytokine such as interleukin-2 or GM-CSF isco-expressed. The nucleic acid can be substantially pure, combined withan immunostimulatory adjuvant, used in combination with animmunostimulatory cytokine, or coadministered with a suitable deliverysystem (e.g., liposome). The nucleic acid vaccine may be administeredwith an adjuvant described in relation to the above-described peptidevaccine. Preferably, the nucleic acid vaccine is administered without anadjuvant.

The polynucleotide of the present invention, in some cases, issubstantially pure or comprised in a suitable vector or a deliverysystem. Suitable vectors and delivery systems included are viral, suchas adenovirus, vaccinia virus, retrovirus, herpes virus,adeno-associated virus, or a plurality of virus element-containinghybrid based system. Non-viral delivery systems included are cationiclipids and cationic polymers well known in the technical field of DNAdelivery. Physical delivery such as a “gene gun” can also be used. Thepeptide or a peptide encoded by the nucleic acid is in some cases afusion protein, such as a fusion protein with an epitope from tetanustoxoid that simulates a CD4 position T cell.

Suitably, all peptides administered to the patient are sterilized andfree of pyogenic substance. A naked DNA can be administered by anintramuscular, intradermal, or subcutaneous injection. Conveniently, thenucleic acid vaccine can have any nucleic acid delivery means.Preferably, the nucleic acid, which is a DNA, can be delivered in aliposome or as a part of a viral vector delivery system.

It is preferable that a nucleic acid vaccine such as a DNA vaccine isadministered into a muscle. A peptide vaccine is preferably administereds.c. or i.d. It is also preferable to administer the vaccineintradermally.

Expression of an encoded polypeptide and uptake of a nucleic acid by aprofessional antigen-presenting cell such as a dendritic cell arepossibly a mechanism of priming of an immune response. Although there isa possibility that a dendritic cell is not introduced, it is stillimportant as an expression peptide can be taken in from a cellintroduced into the tissue (“cross-priming”. Example: Thomas A M,Santarsiero L M, Lutz E R, Armstrong T D, Chen Y C, Huang L Q, Laheru DA, Goggins M, Hruban R H, Jaffee E M. Mesothelin-specific CD8(+) T cellresponses provide evidence of in vivo cross-priming byantigen-presenting cells in vaccinated pancreatic cancer patients. J ExpMed. 2004 Aug. 2; 200(3): 297-306).

A cancer immune therapeutic method mediated by a polynucleotide isdescribed in Conry et al (1996) Seminars in Oncology 23, 135-147; Condonet al (1996) Nature Medicine 2, 1122-1127; Gong et al (1997) NatureMedicine 3, 558-561; Zhai et al (1996) J. Immunol. 156, 700-710; Grahamet al (1996) Int J. Cancer 65, 664-670; and Burchell et al (1996)309-313 In: Breast Cancer, Advances in biology and therapeutics, Calvoet al (eds), John Libbey Eurotext, the entirety of all of which isincorporated herein by reference.

It is potentially useful in the present invention to administer thepeptide or nucleic acid ex vivo and to have the vaccine of the presentinvention target a specific cell population such as antigen-presentingcells by selective purification of a specific cell population from apatient or use of a delivery system, targeting vector and injection site(for example, as described in Zhou et al (1995) Blood 86, 3295-3301;Roth et al (1996) Scand. J. Immunology 43, 646-651, dendritic cells canbe sorted). For example, a targeting vector can have a tissue or tumorspecific promoter for directing expression of an antigen at a suitablelocation.

The vaccine of the present invention may be dependent on the cancer typeand condition of the disease specific to a patient receiving therapy,therapeutic regimen up to this point, immune status of the patient andnaturally the HLA haplotype of the patient. Furthermore, the vaccineaccording to the present invention can comprise an individualizedconstituent element depending on individual needs of a specific patient.Examples thereof include expression of related TAA, personal side effectdue to allergy or other therapy, and different amounts of peptide inaccordance with adjustment for secondary therapy after a series ofinitial therapeutic plans for a specific patient.

The peptide of the present invention is not only useful in cancertherapy, but also in diagnosis. The peptide is generated fromglioblastoma, and these peptides are identified to be absent from normaltissue. Thus, these peptides can be used to diagnose the presence ofcancer.

A pathologist can use the presence of the peptide of the presentinvention in tissue biopsy to assist in the diagnosis of cancer. Apathologist can know whether the tissue is malignant, inflammatory ormostly affected by mass spectrometry or detection of a specific peptideof the present invention using an antibody, or another method well knownin the art. The presence of a group of peptides of the present inventionenables classification or sub-classification of affected tissue.

Detection of the peptide of the present invention in an affected tissuesample enables decision with respect to the benefits of a therapeuticmethod related to the immune system especially when it is known oranticipated that a T lymphocyte is associated with the action mechanism.Loss of MHC expression is a mechanism that is well understood, by whicha malignant cell evades immune surveillance. Thus, the presence of thepeptide of the present invention indicates that this mechanism is notutilized by the analyzed cell.

The peptide of the present invention can be used in the analysis of alymphocyte response to the peptide of the present invention. Forexample, it is possible to analyze an antibody response or T cellresponse to the peptide of the present invention or the peptide of thepresent invention which is a complex with an MHC molecule. Theselymphocyte responses can be used as a prognosis marker for determining afurther therapeutic step. These responses can also be used as asurrogate marker in an immunotherapeutic approach attempting to elicit alymphocyte response by different means such as a protein, nucleic acid,endogenous substance, or lymphocyte immune transfer vaccination. Underthe setting of gene therapy, a lymphocyte response to the peptide of thepresent invention can be considered in assessing a side effect.Monitoring a lymphocyte response is possibly useful in a follow-up testafter transplantation therapy, such as detection of graft-versus-hostand host-versus-graft diseases.

The peptides of the preset invention can be used in generation andgrowth of antibodies specific to an MHC/peptide complex. They can beused in a therapeutic method to apply a toxin or a radioactive substancewhile targeting affected tissue. As another method of using suchantibodies, the antibodies can be applied while targeting a radionuclideto affected tissue for use in an imaging method such as PET. This methodof use can assist in detecting small metastasis or determining anaccurate position and size of affected tissue. In addition, the peptidescan be used in verification of diagnosis of cancer performed by apathologist based on biopsy sample.

The present invention can be provided as a kit. As used herein, “kit”refers to a unit generally providing portions to be provided (e.g.,testing agent, diagnostic agent, therapeutic agent, antibody, label,manual and the like) into two or more separate sections. This form of akit is preferred when a composition that should not be provided in amixed state and is preferably mixed immediately before use for safety orthe like is intended to be provided. Such a kit advantageously comprisesan instruction or manual describing how the provided portions (e.g.,testing agent, diagnostic agent, or therapeutic agent) are used or how areagent should be handled. When the kit is used herein as a reagent kit,the kit generally comprises an instruction describing how to use atesting agent, diagnostic agent, therapeutic agent, antibody and thelike.

In this manner, in a further aspect of the present invention, thepresent invention is directed to a kit, wherein the kit has (a) acontainer comprising the pharmaceutical composition of the presentinvention in a solution form or a freeze-dried form, (b) selectively asecond container comprising a diluent or a reconstitution solution forthe freeze-dried formulation, and (c) selectively, a manual directed to(i) use of the solution or (ii) reconstitution and/or use of thefreeze-dried formulation. The kit further has one or more of (iii)buffer, (iv) diluent, (v) filter, (vi) needle, or (v) syringe. Thecontainer is preferably a bottle, vial, syringe or a test tube or amulti-purpose container. The pharmaceutical composition is preferablyfreeze-dried.

The kit of the present invention preferably has a manual directed to thereconstitution and/or use of the freeze-dried formulation of the presentinvention in a suitable container. Examples of a suitable containerinclude bottles, vials (e.g., dual chamber vials), syringe (dual chambersyringe and the like) and test tubes. The container can be formed fromvarious materials such as glass or plastic. Preferably, the kit and/orcontainer comprises a manual, on the container or accompanying thecontainer, showing the method of reconstitution and/or use. For example,the label can explain that the freeze-dried formulation is reconstitutedto the above-described peptide concentration. The label can furtherexplain that the formulation is for subcutaneous injection or is usefulfor a subcutaneous injection.

The container of the formulation may be a multi-purpose vial that can beused in repeated administration (e.g., 2 to 6 administrations). The kitcan further have a second container with a suitable diluent (e.g.,sodium bicarbonate).

The final peptide concentration of a reconstituted formulation made bymixing the diluent and the freeze-dried formulation is preferably atleast 0.15 mg/mL/peptide (=75 μg) and preferably 3 mg/mL/peptide (=1500μg) or less. The kit can further comprise other materials that aredesirable for commercial purpose or for users (including other buffer,diluent, filter, needle, syringe, and package insert).

The kit of the present invention can have a single container comprisinga formulation of the pharmaceutical composition of the present inventionwith or without other constituent elements (e.g., other compounds orpharmaceutical composition of the other compounds). Alternatively, thekit can have separate containers for each constituent element.

Preferably, the kit of the present invention comprises a formulation ofthe present invention packaged for use as a combination with combinedadministration of a second compound (adjuvant (e.g., GM-CSF),chemotherapeutic agent, natural product, hormone or antagonist,anti-angiogenic agent or angiogenesis inhibitor, apoptosis inducer,chelating agent or the like) or a pharmaceutical composition thereof.The constituent element of the kit can be a complex made in advance oreach constituent element can be placed in separate containers untiladministration to a patient. The constituent element of the kit can beprovided as one or more liquid solutions, preferably as aqueoussolutions, and more preferably sterilized aqueous solutions. Theconstituent elements of the kit can also be provided as a solid, whichpreferably can be converted into a liquid by adding a suitable solventprovided in another separate container thereto.

The container of a therapeutic kit can be a vial, test tube, flask,bottle, syringe or any other means for sealing a solid or a liquid. Whenthere are a plurality of constituent elements, the kit generallycomprises a second vial or another container such that the constituentelements can be administered separately. The kit can also compriseanother container for a pharmaceutically acceptable liquid. Preferably,a therapeutic kit comprises an instrument (e.g., one or more needles,syringes, instillator, pipettes or the like) that enables administrationof an agent of the present invention, which is the constituent elementof the kit.

The pharmaceutical composition of the present invention is suitable foradministering the peptide by any acceptable route such as oral(enteral), trans-nasal cavity, transocular, subcutaneous, intradermal,intramuscular, intravenous or transdermal route. The administration ispreferably subcutaneous administration and most preferably intradermaladministration. Administration can be performed with an infusion pump.

As used herein, “instruction” is a document with an explanation of themethod of use of the present invention for a physician or other users.The instruction describes detection method of the present invention,method of use of a diagnostic agent, or sentences instructingadministration of a medicament or the like. Further, an instruction maydescribe a sentence instructing oral administration or administration tothe esophagus (e.g., by injection or the like) as a site ofadministration. The instruction is prepared in accordance with a formatdefined by an authority of the country in which the present invention ispracticed (e.g., Health, Labor and Welfare Ministry in Japan or Food andDrug Administration (FDA) in the U.S. or the like), with an explicitdescription showing approval by the authority. The instruction is aso-called package insert and is typically provided in, but not limitedto, paper media. The instructions may also be provided in a form such aselectronic media (e.g., web sites provided on the Internet or emails).

(General Techniques)

Molecular biological technology, biochemical technology, andmicrobiological technology used herein are well known and conventionaltechnologies in the art that are described in, for example, Sambrook J.et al. (1989). Molecular Cloning: A Laboratory Manual, Cold SpringHarbor and 3^(rd) Ed. thereof (2001); Ausubel, F. M. (1987). CurrentProtocols in Molecular Biology, Greene Pub. Associates andWiley-Interscience; Ausubel, F. M. (1989). Short Protocols in MolecularBiology: A Compendium of Methods from Current Protocols in MolecularBiology, Greene Pub. Associates and Wiley-Interscience; Innis, M. A.(1990). PCR Protocols: A Guide to Methods and Applications, AcademicPress; Ausubel, F. M. (1992). Short Protocols in Molecular Biology: ACompendium of Methods from Current Protocols in Molecular Biology,Greene Pub. Associates; Ausubel, F. M. (1995). Short Protocols inMolecular Biology: A Compendium of Methods from Current Protocols inMolecular Biology, Greene Pub. Associates; Innis, M. A. et al. (1995).PCR Strategies, Academic Press; Ausubel, F. M. (1999). Short Protocolsin Molecular Biology: A Compendium of Methods from Current Protocols inMolecular Biology, Wiley, and annual updates; Sninsky, J. J. et al.(1999). PCR Applications: Protocols for Functional Genomics, AcademicPress, Gait, M. J. (1985). Oligonucleotide Synthesis: A PracticalApproach, IRL Press; Gait, M. J. (1990). Oligonucleotide Synthesis: APractical Approach, IRL Press; Eckstein, F. (1991). Oligonucleotides andAnalogues: A Practical Approach, IRL Press; Adams, R. L. et al. (1992).The Biochemistry of the Nucleic Acids, Chapman & Hall; Shabarova, Z. etal. (1994). Advanced Organic Chemistry of Nucleic Acids, Weinheim;Blackburn, G. M. et al. (1996). Nucleic Acids in Chemistry and Biology,Oxford University Press; Hermanson, G. T. (1996). BioconjugateTechniques, Academic Press, Bessatsu Jikken Igaku [ExperimentalMedicine, Supplemental Volume], Idenshi Donyu Oyobi Hatsugen KaisekiJikken Ho [Experimental Methods for Transgenesis & Expression Analysis],Yodosha, 1997, and the like, the relevant portions (which can be theentire document) of which are incorporated herein by reference.

Reference literatures such as scientific literatures, patents, andpatent applications cited herein are incorporated herein by reference tothe same extent that the entirety of each document is specificallydescribed.

As described above, the present invention has been described whileshowing preferred embodiments to facilitate understanding. The presentinvention is described below based on Examples. The aforementioneddescription and the following Examples are not provided to limit thepresent invention, but for the sole purpose of exemplification. Thus,the scope of the present invention is not limited to the embodiments andExamples specifically described herein and is limited only by the scopeof claims.

EXAMPLES

(Examples of Preparation of Unbiasedly Amplified Sample)

(Preparation Example 1: Analysis of BCR repertoire in peripheral bloodof healthy individuals) The present Example performed BCR repertoireanalysis on peripheral blood of healthy individuals.

(Materials and Methods)

Sample: Peripheral blood mononuclear cells of healthy individuals

Method:

(1. RNA Extraction)

5 mL of whole blood was collected from said healthy individuals in aheparin-containing blood collection tube. Peripheral blood mononuclearcells (PBMC) were separated by ficoll density gradient centrifugation.Total RNA was extracted/purified from the isolated 5×10⁶ PBMCs by usingRNeasy Lipid Tissue Mini Kit (QIAGEN, Germany). The resulting RNA wasquantified by absorbance of A260 by using an absorption spectrometer.The concentration was 232 ng/μL in 30 μL of eluate.

(2. Synthesis of Complementary DNA and Double Stranded ComplementaryDNA)

The extracted RNA sample was used to carry out adaptor-ligation PCR.First, in order to synthesize a complementary DNA, a BSL-18E primer(Table 1-1) and 3.5 μL (812 ng) of RNA were admixed and annealed for 8minutes at 70° C. After cooling on ice, a reverse transcription reactionwas performed in the presence of an RNase inhibitor (RNAsin) tosynthesize a complementary DNA with the following composition.

TABLE 1-1A Synthesis of complementary DNA Regent Content (μL) Finalconcentration RNA solutin 3.5 200 μM BSL-18E 1.5 30 μM Total 5 70° C., 8minutes 5× First strand buffer 2 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mMMgCl₂ 0.1 M DTT 1 10 mM 10 mM dNTPs 0.5 500 μM RNasin (Promega) 0.5 2U/μL Superscript III ™, 200 U/μL 1 20 U/μL (Invitrogen)

The complementary DNA was subsequently incubated for hours at 16° C. inthe following double-stranded DNA synthesis buffer in the presence of E.coli DNA polymerase I, E. coli DNA Ligase, and RNase H to synthesize adouble stranded complementary DNA. Furthermore, T4 DNA polymerase wasreacted for 5 minutes at 16° C. to perform a 5′ terminal bluntingreaction.

TABLE 1-1B Synthesis of complementary DNA Regent Content (μL) Finalconcentration complementary DNA 9 reaction solution Sterilized water46.5 5× Second strand buffer 15 25mM Tris-HCl, pH 7.5, 100 mM KCl, 5 mMMgCl₂, 10mM (NH₄)SO₄, 0.15 mM β-NAD+, 1.2 mM DTT 10 mM dNTPs 1.5 0.2 mME. coli DNA ligase, 10 U/μL 0.5 0.067 U/μL (Invitrogen) E. coli DNApolymerase, 2 0.27 U/μL 10 U/μL (Invitrogen) RNaseH, 2 U/μL 0.5 0.013U/μL (Invitrogen) Total 75 μL 16° C., 2 hours T4 DNA polymerase, 1 0.067U/μL 5 U/μL (Invitrogen) 16° C., 5 minutes

A double stranded DNA, after column purification by a High Pure PCRCleanup Micro Kit (Roche), was incubated all night at 16° C. in thepresence of a P20EA/10EA adaptor (Table 1-1) and T4 ligase in thefollowing T4 ligase buffer for a ligation reaction.

TABLE 1-1 Primer sequences  Primer Sequence  BSL-18EAAAGCGGCCGCATGCTTTTTTTTTTTTTTTTTTVN  (SEQ ID NO: 1)  P20EATAATACGACTCCGAATTCCC (SEQ ID NO: 2)  P10EA GGGAATTCGG (SEQ ID NO: 3) B-P20EA Adaptor B-TAATACGACTCCGAATTCCC (SEQ ID NO: 4)  CM1TCCTGTGCGAGGCAGCCAA (SEQ ID NO: 5)  CM2GTATCCGACGGGGAATTCTC (SEQ ID NO: 6)  CM3-GSAdaptor A (SEQ ID NO: 39)-key(TCAG)- MID1-(SEQ ID NO: 40)AAAGGGTTGGGGCGGA TGC (SEQ ID NO: 1387)(Entire primer  is SEQ ID NO: 7) CA1GCTGGCTGCTCGTGGTGTAC (SEQ ID NO: 8)  CA2GGGAAGTTTCTGGCGGTCACG (SEQ ID NO: 9)  CA3-GSAdaptor A (SEQ ID NO: 39)-key(TCAG)- MID2 (SEQ ID NO: 40-CCGCTTTCGCTCCAGGTCAC (SEQ ID NO: 1388)(Entire primer  is SEQ ID NO: 10) CG1CACCTTGGTGTTGCTGGGCTT (SEQ ID NO: 11)  CG2TCCTGAGGACTGTAGGACAGC (SEQ ID NO: 12)  CG3-GSAdaptor A (SEQ ID NO: 39)-key(TCAG)- MID3 (SEQ ID NO: 42)-TGAGTTCCACGACAC CGTCAC (SEQ ID NO: 1389)(Entire primer  is SEQ ID NO: 13) CD1GTCCCGTCTTTGTATCTCAG (SEQ ID NO: 14)  CD2TCTGTGTCCCCATGTACC (SEQ ID NO: 15)  CD3-GSAdaptor A (SEQ ID NO: 39)-key(TCAG)- MID4 (SEQ ID NO: 43)- CCCAGTTATCAAGC ATGCC (SEQ ID NO: 1390)(Entire primer  is SEQ ID NO: 16) CE1CATAGTGACCAGAGAGCG (SEQ ID NO: 17)  CE2GTGGCTGGTAAGGTCATAG (SEQ ID NO: 18)  CE3-GSAdaptor A (SEQ ID NO: 39)-key(TCAG)- MID5 (SEQ ID NO: 44)- CATTGGAGGGAATG TTTTTG (SEQ ID NO: 1391)(Entire primer  is SEQ ID NO: 19) Adaptor ACCATCTCATCCCTGCGTGTCTCCGAC (SEQ ID  sequence NO: 39) Adaptor BCCTATCCCCTGTGTGCCTTGGCAGTC (SEQ ID  sequence NO: 1375)  *MID: tagsequence

TABLE 1-1C Adaptor adding reaction Regebt Content (μL) Finalconcentration Complementary double 12.5 stranded DNA solution T4ligasebuffer 5 50 mM Tris-HCl, pH 7.6, 10 mM MgCl₂, 1 mM ATP, 5% PEG5000, 1 mMDTT 50 μM P20EA/10EA 5 10 μM adaptor T4 DNA ligase, 1 U/μL 2.5 0.1 U/μL(Invitrogen) Total 25 16° C., all night

An adaptor added double stranded DNA purified by a column as discussedabove was digested by a NotI restriction enzyme (50 U/μL, Takara) withthe following composition in order to remove an adaptor added to the 3′terminal.

TABLE 1-1D Restriction enzyme treatment Regent Content (μL) Finalconcentration complementary DNA 34 reaction solution 10× restrictionenzyme 5 50 mM Tris-HCl, buffer pH 7.5, 10 mM MgCl₂, 1 mM, 1 mM DTT, 100mM NaCl 0.1% BSA 5 0.01% 0.1% Triton X-100 5 0.01% NotI, 50 U/μL(Takara) 1 1 U/μL Total 50 37° C., 2 hours

(3. PCR)

For the first PCR amplification from a double stranded complementary DNA(1^(st) PCR), a common adaptor primer P20EA and each of theimmunoglobulin isotype C region specific primer (CM1, CA1, CG1, CD1, andCE1) were used for 20 cycles, where each cycle consisted of 30 secondsat 95° C., 30 seconds at 55° C., and one minute at 72° C. with thefollowing reaction composition. The primer sequences used are shown inTable 1-1.

TABLE 1-1E 1^(st) PCR amplification reaction composition Regent Content(μL) Final concentration 2× ExTaq Premix (Takara) 10 10 mM Tris-HCl (pH8.3) 50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10mM P20EAprimer 0.5 250 nM 10mM CML, CA1, CG1, CD1 or 0.5 250 nM CE1 primerDouble stranded 2 complementary DNA Sterilized water 7

A 1^(st) PCR amplicon, which is a product of the first PCR amplificationreaction, was then used to perform nested PCR with the reactioncomposition shown below between a P20EA primer and each of theimmunoglobulin isotype C region specific primers. 20 cycles of PCR wereperformed, where each cycle consisted of 30 seconds at 95° C., 30seconds at 55° C., and one minute at 72° C. The primer sequences usedare shown in Table 1-1.

TABLE 1-1F 2^(nd) PCR amplification reaction composition Regent Content(μL) Final concentration 2× ExTaq Premix Takara) 10 10 mM Tris-HCl (pH8.3) 50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mMP20EA primer 1 500 nM 10 mM CM2 (CA2, CG2, 1 500 nM CD2 or CE2) primer1^(st) PCR amplicon 2 Sterilized water 6

Column purification was performed with a High Pure PCR Cleanup Micro Kit(Roche) to remove a primer from a 2^(nd) PCR amplicon, which is aproduct obtained from the second

PCR amplification reaction. Subsequently, PCR was performed with thefollowing reaction composition by using a B-P20EA primer, which is aP20EA primer added with an adaptor B sequence, and a GS-PCR primer,which is C region specific primer of each immunoglobulin added with anadaptor A sequence and identification sequence MID Tag sequence, withthe 2^(nd) PCR amplicon as a template. 10 cycles of PCR were performed,where each cycle consisted of 30 seconds at 95° C., seconds at 55° C.,and one minute at 72° C. The primer sequences used are shown in Table1-1.

TABLE 1-1G GS-PCR (3^(rd) PCR) amplification reaction composition RegentContent (μL) Final concentration 2× ExTaq Premix 10 10 mM Tris-HCl(Takara) (pH 8.3) 50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaqpolymerase 10 mM B-P20EA primer 1 500 nM 10 mM CM3-GSPCR, CA3- 1 500 nMGSPCR, CG3-GSPCR, CD3- GSPCR or CE3-GSPCR primer 2^(nd) PCR ampliconafter column 1 purification Sterilized water 7

(4. Next Generation Sequencing)

After GS-PCR amplification under the optimal condition, 2% agarose gelelectrophoresis was performed. When visualized, a band was cut out in asize of interest (500 bp-700 bp) and purified by using a DNApurification kit (QIAEX II Gel Extraction Kit, QIAGEN). The amount ofcollected DNA was measured by using a Quant-iT™ PicoGreen® dsDNA AssayKit (Invitrogen). The amounts of DNA collected in an amplicon derivedfrom each isotype were IgM (1611 ng/mL), IgG (955 ng/mL), IgA (796ng/mL), IgD (258 ng/mL), and IgE (871 ng/mL). They were mixed so thatthe amount of DNA of isotype amplicons would be equal. 10 million DNAswere used in emulsion PCR for sequence analysis with Roche's nextgeneration sequence analyzer (GS Junior Bench Top system).

(5. Data Analysis)

Sequence read analysis assigned V, D, J, and C sequences of each readsequence by using V, D, J, and C sequences obtained from the IMGT (theinternational ImMunoGeneTics information system, http colon//www dotimgt dot org) database as reference sequences. IMGT's HighV-Quest andnewly developed repertoire analysis software (Repertoire Genesis, seethe patent application concurrently filed. The content thereof isincorporated herein by reference) were used for the assignment.

FIG. 1 shows cross-reactivity of isotype specific primers. In order toassess the specificity of the immunoglobulin isotype specific primersthat were used, amplification was performed with an immunoglobulinisotype specific primer of interest as well as another isotype specificprimer to verify the presence of cross-reactivity. 10 μL of GS-PCRamplicon was subjected to electrophoresis with 2% agarose gel in a TAEbuffer, and then assessed by ethidium bromide staining. A 2^(nd) PCRamplicon amplified with each isotype specific primer was not amplifiedwith another isotype specific GS-PCR primer, verifying the high level ofspecificity of the primers.

FIG. 2 shows results of studying the optimal dilution concentration. AGS-PCR optimal condition for each isotype was studied. A 2-fold serialdilution system of 2^(nd) PCR amplicons was created to perform 20 cyclesof GS-PCR. Excellent results were obtained for 16-fold dilution.

FIG. 3 shows results of studying the optimal number of cycles. 16-folddilution 2^(nd) PCR amplicons were used for 10, 15, and 20 cycles PCR.For IgM, IgG, IgA, and IgD, excellent amplification was confirmed with10 cycles. Further, it was confirmed that 20 cycles were appropriate forIgE.

FIG. 4 shows the read length from next generation sequencing. The readlength from next generation sequencing of a BCR gene is shown in FIG. 4.The number of reads in Raw data is 130,000, and more than 90,000 readsthat has gone through Filter pass were obtained. Table 1-2 shows thenumber of reads from each isotype that was labeled with a Tag.

TABLE 1-2 Number of reads for each isotype MID Tag isotype Reads MID1IgM 22267 MID2 IgG 18031 MID3 IgA 15964 MID4 IgD 22248 MID5 IgE 15219

FIG. 5 shows the read length of each MID. The read length and number ofreads divided for each MID were evenly distributed. When counted whilesetting the read length sufficient for analyzing a V region as 400 bp orgreater, half of the reads, about 10000 reads, were considered effectivefor BCR repertoire analysis.

FIG. 6 shows usage frequency of a C region sequence for each isotype.Search for homology with a C region sequence of immunoglobulin isotypeincluding subclasses was performed on the obtained reads for eachisotype. The frequency of number of reads for each subclass was 73% forIgA1 and 27% for IgA2 in the IgA subclass, 62% for IgG1 and 36% for IgG2in the IgG subclass, while hardly any reads for IgG3 or IgG4 wereobtained in the IgG subclass. Further, since obtained reads for eachsubclass were rarely classified into other classes, primer specificitywas reconfirmed at the sequence level.

HighV-Quest of IMGT was used to assign V, D, and J regions (FIGS. 6A,7A, 8A, and RA). Further, the results of assignment of V, D, and Jregions by using a newly developed repertoire analysis software(Repertoire Genesis, patent pending) are shown in Table 1-1H. The datafor number of reads was used to find the frequency of a V region and Jregion (FIGS. 6B, 7B, 8B, and 9B). The data thereof is shown below(Table 1-1H).

[TABLE 1H1] IgA num num SEQ ID gene name reads CDR3 amino acids readsNO:  IGHD2-21 74 CAKDMCGLWASCGGDCYSRRTTSLTT 41 61CAKDMCGLWASCGGDCYSRRTASLTT 5 62 CARGPNMAFVVVTAILMLLIS 4 63CARAPDCGGSTCYSHPYYGMDVW 4 64 CARSHIVVVTAIPLEMLLIS 2 65CARDPRIVVVAPATHTPTTVWTS 2 66 IGHD6-13 41 CGRSRHSSSWQILTP 11 67CANGGLAAAGDHLTT 5 68 CALCPTPIAAAGSVTT 5 69 CARAPSIPVAGIGYHFDHW 3 70CALCPNPYSSGWFCNYW 3 71 CARAPSIPVAGIATTLTT 2 72 IGHD3-3 37CIYDFWSGGPHPTLTT 11 73 CARIVNTEGFWSGFLTP 4 74 CTRRGGVVIICLTT 2 75CIHTGNDFWTGTNYGLTS 2 76 CARIVNTEGFGVVFLTP 2 77 CAKDRFSGRGRFEFMEWLTPLTT 278 CAKDRFSEGKVQFMEWLTPLTT 2 79 IGHD3-22 36 CARRPIPPLTMRVVVIPLTS 5 80CARDPPMPMIVVQTLTT 2 81 CARDPPMPMILVQTLTT 2 82 CAKILITMILVVSLMLLIS 2 83IGHD3-10 33 CAREIRGTTMVRELTTSTATWTP 6 84 CVRTYYFGLGDIITEITSTVWTS 3 85CATYYYGSGSAGHNFDYW 2 86 CARRTYYYGSTNLTT 2 87 CARMVTDYYGSGNRGWFDPW 2 88CARGPGLSVMIRGVITTPNHILIT 2 89 CARDYYGSGVMTL 2 90 IGHD2-15 29CARAPDCGGGTCYSHPTTVWTS 5 91 CARARIVVVVPATLTPTTVWTS 3 92CARAPDCGGGTCYSHPYYGMDVW 3 93 CAKDLAPLKSCSRGGCYPYYYGLDIW 3 94 IGHD1-26 22CARGPATAILGATPSLTP 3 95 CARDDSASYSRGTT 3 96 CVRHDYSDNDLSTNWFGP 2 97IGHD5-5 19 CMGPGDTAI 7 98 CARRPREMESAMVLSLTT 2 99 IGHD3-9 19CAHSAPYYDILSRNRARSWKDFDNW 3 100 CAHSAPIMIFCLVTAHEVGRILTT 3 101CATVALLRYFDWSSTR 2 102 IGHD6-19 14 CARGRRSLPAIVYSSGPDRPNWFDPW 6 103CTSAAVASSSGWPLRGVWTS 3 104 IGHD2-2 14 CARAPLCSSASCHLQLDYW 5 105 IGHD4-2311 CARGAGYGGNSGVRTT 9 106 IGHD4-17 10 CARTLYGDFVDF 2 107 IGHD3-16 9CAKGVLSSGGVIATLPGSTP 3 108 CARGFGARGVILT 2 109 IGHD2-8 8CANVGGADRNYCINGVRHNPNYLTT 5 110 IGHD6-6 5 IGHD5-12 5 CARHVNGYDYLFPFTSW 3111 IGHD1-1 3 CARGGSQLERRRPLVTT 3 112 IGHD5-24 2 (unidentified) 662CARETVGGTLTT 19 113 CARTSSGHDPPIITGWTS 13 114 CAKGHQVRLRGRTGTSIS 11 115CARSPIWFGSHRFTTTWRS 9 116 CARDPLETGATSLII 9 117 CAKLGNRPGFTEWDHWFGPW 9118 CAGAPDCGVGAAPLTSTTVCTS 8 119 CVRDPHETGATTLIT 6 120 CARIRKEVGAPPITWTS6 121 CARGSWSGAAFYSLTT 6 122 CARDPNKFRTNHLSTT 6 123 CAGIGGATSTTTTTTWTS 6124 CATVPELTDISLPRLMALIS 5 125 CARVWGKHTLTT 5 126 CARRAAPHDYGHVLIF 5 127CARDPNKFRPNHLSTT 5 128 CARAGRELLRALMTT 5 129 CARAGAELLRALMTT 5 130CARAEDYYDTEGYFYLTP 5 131 CAHRTNYSTNRYGAFTTLTS 5 132 CVRHDGSFTKTGSTP 4133 CVRDPQETGATTLIT 4 134 CVKIGAAH 4 135 CATQCLGGAGLTTTTAPWTS 4 136CARRTYYSGSTNLTT 4 137 CARRTTRETGSSIS 4 138 CARLRCSNDNCAGHLYYYFSGLDIW 4139 CARASLPRGLLIS 4 140 CAKGRGRRAAGKFLTT 4 141 CVRQYGLGSGSLTP 3 142CVKIRNLIGFTGSTP 3 143 CTRDGVRGDLNPTLNV 3 144 CTKGGGRKTAGKFLTP 3 145CDKAKVTADLRT 3 146 CATVPELPDISLPRLMALIS 3 147 CATVFGRRYRLLTT 3 148CARYRAAYPRRAWTS 3 149 CARTIGFEIAMTGGLGALTP 3 150 CARTARTGDL 3 151CARRDPPVRASLSTTLTS 3 152 CARIGHEFYSLTYSVNDVFDLW 3 153CARFQRYCRGGSCSATLDAFDKW 3 154 CARDLGERRDGEVINWFDAW 3 155 CARDLAVWATLTT 3156 CAKGAGRRAAGKFLTT 3 157 CAKDVEPTVTLYNHFDP 3 158 CAKDFNWEGIT 3 159CAHRTNYSTNRYGGLYYFDFW 3 160 CVRGVGTILWLTI 2 161 CVRFIGAYSNNWYPGYFDYW 2162 CVRDAGPGGSLTS 2 163 CTTGFSGSTACHWDHTACHWDDAFAMW 2 164CTHAVESLLGTTSTS 2 165 CIHTGNDFGPGPTMVWTS 2 166 CGVGRGDNDVDFKFKW 2 167CATRESPLTT 2 168 CATAGIELWRAGSTP 2 169 CASQSQNYYYYYMDVW 2 170CASKKEILWAGPNLTT 2 171 CARYRIAMATSPYFDYW 2 172 CARVRCGLVASEGVLIS 2 173CARTNFGSGGYILGDTTMVWTS 2 174 CARSAGYLHRRTS 2 175 CARRTYYSGSTNFDYW 2 176CARRDLPFGASLSTTLTS 2 177 CARRAAPMTTGMFLIF 2 178 CARPGFSYGPRLTP 2 179CARLRGGFPPVVKRVEVFLLTS 2 180 CARKKIPTAGYSSLTT 2 181 CARGSWMGRPFISLTT 2182 CARGRFARGGDDSLIS 2 183 CARGLRWADN 2 184 CARGGTSGLILDTTSTPWTS 2 185CAREMHIDSLTVGRAFDIW 2 186 CARDVPDIYSSGATDC 2 187 CARDPSYLPTPALKT 2 188CARDPNKFRPNHFVDYW 2 189 CARDLGTTNYWLDTW 2 190 CAKQRASGNSLTI 2 191CAKEPKIVGRRRTTLIT 2 192 CAKDLGVCSEGAASSLVLIS 2 193 CAKAPGDLCRSTP 2 194CAHSAPYYDICLVTAHEVGRILTT 2 195 CAGLIGRFIPLTT 2 196 CAGIRGSNIYYHYYYMDVW 2197 CADLPGIIGGEIT 2 198

[TABLE L-1H2] IgD num num SEQ ID gene name reads CDR3 amino acids readsNO: IGHD3-22 432 CARHDTPRVYYDSSGYYYGVDYFDYW 168 199CASMDTKNYYDSSGSQPRRSYYFDYW 39 200 CAQYYYDSSGYYYYYGMDVW 25 201CARTSYYYDSSGYYYRDW 21 202 CARVRGITMIVVVTTLTT 17 203CASMDTKNYYDSSGSQPGGRTTLTT 7 204 CASMDTKITMIVVVPNPGGRTTLTT 7 205CARYNYTIVVGP 5 206 CARTSYYYDSSGYYTVT 5 207 CARIRYYYDSSGYYYFDYW 4 208CARHVRDGMIVVAEIDYW 4 209 CARVAVRSYYPFGMDVW 3 210 CARLPLDSSGYYLTT 3 211CARLPLDSSGYYFDYW 3 212 CASMDTKNYYDSSGSQPRRSHYFDYW 2 213CARYRITMIVVVITTVT 2 214 CARYNYYDSSGSW 2 215 CARVRGYYDSMSMSALLMS 2 216CARVRGTMIVVSMSALLMS 2 217 CARVRGNYYDSSGYYFDYW 2 218 CARSGRVGARPKLYYW 2219 CARTSYYYDVVVITTVT 2 220 CARHVRDGMIVVAEMTT 2 221CARHDTPRAYYDSSGYYYGVDYFDYW 2 222 CAREFFGTRTMIVVVTYFDYW 2 223CAQYYYDSMVITTTTVWTS 2 224 IGHD3-10 217 CARGVRGVIINTFTTLTT 118 225CTWFGEATTTVWTS 25 226 CARGGSGVIINTFTTLTT 10 227 CARCAGGSGSYYYYYMEVW 9228 CVKAGFGELLIGGDRTT 6 229 CARGGSGSYYKHVYYFDYW 6 230 CARCAGGSGVTTTTTWRS6 231 CTWFGGGYYYGMDVW 5 232 CTWFGGATTTVWTS 3 233 CARLDGSGRRGTALTT 3 234CARLDVRGGRGTALTT 2 235 CARGGSGVIINTFTTLTM 2 236 CARCAGVRGVTTTTTWRS 2 237IGHD3-16 169 CARRVMITFGELSSTTLTT 140 238 CARRVMITFGGVIVTTLTM 3 239CARRVMITFGGVIVDYFDYW 3 240 CARRVMITFGGVIVDTLTT 3 241 CANPTSFRQCSMTT 3242 CARRVMITLGELSSTTLTT 2 243 CARRAMITFGELSSTTLTT 2 244 IGHD6-19 134CARHGIAVAYYFDYW 28 245 CARVSSGWSGGNPAPATLTT 22 246 CARHVGSGWVYFDYW 12247 CARRDDSSGWYGHDYW 11 248 CARGYSSGFGDALIP 8 249 CARVSSGWSGVTPAPATLTT 7250 CARRDDSMAGTAMTT 4 251 CARGYSSGFGDAFDTW 4 252 CARGIRYSSGWYGSNWFDPW 4253 CARRDDSSGWYGHDY 3 254 CARRDDSSAGTAMTT 3 255 CARHGIAWPTTLTT 3 256CARHVGSGWVTLTT 2 257 CARHGIAVAYYLTT 2 258 IGHD5-5 122CARAGGYSYGYLLPLMLLIS 29 259 CARRKRELLWVTTTTTTWTS 20 260 CARQKSATVWTS 17261 CARVNLEQLWYRRGTTTTVWTS 8 262 CARVNLEQLCTGRGTTTTVWTS 7 263CARFYNRRMLSTAMVDIDYW 5 264 CARVNLEQLWYRTGYYYYGMDVW 4 265CARLFNYAREYGMDVW 4 266 CARVNLEQLWYRTGSTTTVWTS 2 267 CARVAPRLTT 2 268CARLFNYARGVRVWTS 2 269 IGHD1-26 85 CARHVGSGWVYFDYW 16 270CARAQYSGATECKGTLTT 10 271 CARHSLTPGFLLNYFDYW 8 272 CTRSRGLSGTYYNPDNDYW 7273 CARAQYSGSYRMQRYFDYW 7 274 CARHVKVLGATVGFDYW 3 275 CARAQYSGSYRCKGTLTT3 276 CTRSRGLSGTTTIQIMTT 2 277 CARPSIVGATECKGTLTT 2 278 CARHVAVAGSTLTT 2279 CARAQYSGSYRMQRYLTT 2 270 CARAQYSGSYRMQGTLTT 2 281 CAHTHRSVGATA 2 282IGHD6-13 73 CAKVTHAYSSTWYHGDYYYYGMDVW 28 283 CARGHLPYSSTDKGHWFDPW 16 284CARDSSHGYSSSWPDYW 4 285 CAKLPMRIAAPGTMGTTTTTVWTS 3 286 CARDSSTGIAAAGPTT2 287 IGHD2-15 46 CLASRPLWFGDPNGSTP 5 288 CAKDSSRYCSGGSCKYFDYW 4 289CAKNPASTGYGSFDYW 3 290 CAKIRPVLVTEALTI 3 291 CAKDSSRYCMVVAANTLTT 3 292CARLILGYCSGVGCTPT 2 293 CARDRGSGGSCYVLTT 2 294 CARDGVVVVLLLLTT 2 295CANWARVVVASGTTTTWTS 2 296 CAKNPPVLVTEALTI 2 297 CAKNPAVLVTEALTI 2 298IGHD3-3 25 CASKKKFLEWPETTTTTVWTS 6 299 CARKEFLEWPETTTTTVWTS 5 300CAKDINPDYDFWSGSHLPYDAFDIW 5 301 CARKKNFLEWPETTTTTVWTS 2 302CAKDINPDYDFWSGSHLPYDALIS 2 303 CARDARYCSSTSCYSFPYWYFDLW 2 304 IGHD2-2 18CARDARYCSSTSCYSFPTGTSIS 2 305 CARDARYCSSTSCYRFPYWYFDLW 2 306CARRLGRVATTYYMDVW 5 307 IGHD5-12 15 CARRLVEWLRPTTWTS 3 308CARRVGVEWLRPTTWTS 2 309 CTRDIVLTTPREWYFDLW 5 310 IGHD2-8 13CTRDIVLTTPGSGTSIS 3 311 CARDLIYGDYPTTTWTS 4 312 IGHD4-17 5CARGAAPGVETGSTP 264 313 (unidentified) 989 CARHTLFSDSSAPPRGVYYYYYMDVW 60314 CARGATVGVETGSTP 52 315 CANWAGVTGTVPLTT 41 316 CARQKSATVWTS 33 317CARAGIQLEVFTLTT 21 318 CARLDGSGGRGTALTT 20 319 CARRKRELLWVTTTTTTWTS 19320 CANPDLISAMFDDYW 14 321 CARHQCSGEACFYYYGMDVW 13 322 CANPTSFRQCSMTT 12323 CARGGTIPFPWTS 9 324 CARAQGGAHTTLTT 9 325 CTRDTGSSAGATDLW 8 326CAKAVAVTGSHFDYW 8 327 CARGAAPGVETGSTL 7 328 CARAQGRGTYYFDYW 7 329CARAIIRYFND 7 330 CAIPPDGSRRSPLTT 7 331 CVREGFCGAHGCYSLTYW 6 332CARHTLFSDSSAPPRGSTTTTTWTS 6 333 CARGYSSASVMLLIP 6 334 CVREGFVVLMAVILLPT5 335 CARSGPRGLTT 5 336 CARHSLTPGFLLNYFDYW 5 337 CARGWELDRW 5 338CARDPSSLYYYYYGMDVW 5 339 CARAIIHISMT 5 340 CANPDSFRQCSMTT 5 341CAIPRTEGRRSPLTT 5 342 CARTFGDSAALIS 4 343 CARLFNYAREYGMDVW 4 344CARHTLFSDSSAPPTGGLLLLLHGRL 4 345 CARHTLFRIVVPLPGGSTTTTTWTS 4 346CARDLGESSSTTLTT 4 347 CARAIIRYFNDW 4 348 CAPGGLRLGVETGSTP 4 349CARTLDYGIATGSIIMVWTS 3 350 CARRLGRVARPTTWTS 3 351 CARLFNTPGSTVWTS 3 352CAIPPTEGRRSPLTT 3 353 CAIPPDGRQTVPFDYW 3 354 CTRDTGSPPEPLTS 2 355CATSRGGRGTT 2 356 CATSGGVGDY 2 357 CATPTSFRQCSMTT 2 358 CATAAGLWSSSTTWTS2 359 CATAAGLWSSKYYMDVW 2 360 CASKKSATVWTS 2 361 CARVGSSTMLLIS 2 362CARVGSSPMLLIS 2 363 CARVAARPMLLIS 2 364 CARTFVIRLLLIS 2 365CARRLGRVLRPTTWTS 2 366 CARRKKGSCYRVTTTTTTWTS 2 367 CARLFNYARSTVWTS 2 368CARHVGMAGSTLTT 2 369 CARHTLFSDSSAPPRGGLLLLLHGRL 2 370CARHTLFSDSSALPRGVYYYYYMDVW 2 371 CARHTLFSDSSALPGGSTTTTTWTS 2 372CARHQCSGEACFYYTAWTS 2 373 CARGYSMASVMLLIP 2 374 CARGWELDR 2 375CARGQMGATTLIDYW 2 376 CAREWPTGTRGMW 2 377 CAREWPTGTRGMC 2 378CAREWPTGNQRGCG 2 379 CARDSTQTT 2 380 CARAQYGGATECKGTLTT 2 381CARAQGAGAHTTLTT 2 382 CARAKYDISMT 2 383 CARAIYDISMT 2 384 CARAIIRYSMT 2385 CARAHGAGAHTTLTT 2 386 CANWPGVTGTVPLTT 2 387 CAKRWGSSSWTT 2 388CAKIRPVLVTEALTI 2 389 CAKDLHSYGYLGAFDIW 2 390 CAIPPDGRQTSPLTT 2 391CAIPPDGRQTAPLTT 2 392 CAHTHRSVGALP 2 393 CAGVAPRLTT 2 394

[TABLE 1-1H3] IgE num CDR3 num SEQ ID gene name reads amino acids readsNO: IGHD4-17 3475 CARGFDGGWEHW 3103 395 CARGFLMVAGEHW 113 396CARGFLMVAGST 25 397 CARGFDGGWGAL 21 398 CARGFDGGWEH 17 399 CARGFDGAGST12 400 CARGFDGGWEYW 10 401 CARGFDGGWEHR 10 402 CAGGFDGGWEHW 9 403CARGLMVAGST 8 404 CARGFDGGWST 8 405 CARGFDGGREHW 8 406 CVRGFDGGWEHW 7407 CARGLDGGWEHW 7 408 CARGFDGGWGHW 7 409 CARGFDGGWGALG 6 410CARGFDGAGEHW 6 411 CARGSDGGWEHW 5 412 CARGFGGGWEHW 5 413 CARGFDGSWEHW 5414 CARGFDGGWERW 5 415 CARGFDDGWEHW 5 416 YARGFDGGWEHW 4 417CARGFDGGWKHW 4 418 CARGFDGGSGAL 4 419 CARGFVWWLGGT 3 420 CARGFDSGWEHW 3421 CARGFDRWLGAL 3 422 CARGFDGGWVHW 3 423 CARGFDGGWEAL 3 424CARGFDGDWEHW 3 425 CARGFDGAGSM 3 426 CARSFDGGWEHW 2 427 CARGFLMVAGEHG 2428 CARGFLMVAGEH 2 429 CARGFDVAGST 2 430 CARGFDGGWEHS 2 431 CARGFDGGWEHG2 432 CARGFDGGCEHW 2 433 CARGFDGAGEH 2 434 IGHD1-7 3 CARGFDGGWEHW 3 435(unidentified) 166 CARGFDGGWEHW 124 436 CARGFLMVAGEHW 14 437CARGFDGGWEHS 5 438 CARGFLMVAGST 4 439 CARGFVWWLGGT 2 440 CARGFLMVAGSTG 2441 CARGFDGGSGAL 2 442 CARGFDGAGST 2 443

[TABLE 1-1H4] IgG num num SEQ ID gene name reads CDR3 amino acids readsNO:  IGHD3-10 60 CARGRYAGGVIITALTP 13 444 CARLPRMVRGNWFHP 8 445CARGAWAVRGVISWAGSTP 6 446 CSREVGRDYYGSGVIEITWTS 4 447 CAGSGSGSLLTTVWTP 4448 CSREVGRDYYGSGSYRNYMDVW 3 449 CVSITNSLLWFGELLIFDCW 2 450 IGHD3-22 59CAKITSMIVVLIPTMMLLMS 20 451 CARGSRARFSSDTSGYQYFDYW 4 452CARGVYLYYDSHAYSVLTT 3 453 CARVNYYDSVVLTT 2 454 CARVNYYDSSRIDYW 2 455CARLPPFNNDDSSSYALYLTT 2 456 CARHSNYYYDTSGYRVLDAFDIW 2 457CARGGMDSYGYFYVGHYDYW 2 458 CARDPDF 2 459 CAKITSMIVVLTPTMMLLMS 2 460IGHD6-13 34 CTRQEESSAAGTGGTSSP 7 461 CALCPTPIAAAGSVTT 7 462CATSEGDPVAAAGTKSWFDSW 3 463 CARLALLYGSSRYGATLTT 2 464 CARGPSSTWYSFDYW 2465 IGHD2-15 31 CAKKEFILVVVITMMSLLMS 6 466 CAKDMTAKACSDYW 3 467CARVMGCRGGRCDFRAFDIW 2 468 CARRFCSGGICYFLTT 2 469 CALTGLNGRSCYSELLIS 2470 CAKEGVYFSGGNHYDVAFNVW 2 471 IGHD3-3 28 CARPSRCCYSGGGRLTL 4 472CAHSVGFILDFWSGYQNNWFDPW 4 473 CARPSRCCYVRGGRLTL 2 474 CAMGPTIFGVVFLGSLTS2 475 CAHSVGFILDFWSGYQNNWFDPG 2 476 CAHSVGFILDFGVVIRTTGSTP 2 477IGHD6-25 21 CARVKGGIAGMAWTS 19 478 IGHD5-5 21 CARGVDTTMVRSTTLTT 7 479CARQDPYCSTSNCTMGGAMTLTT 5 480 IGHD5-24 21 CARTDGIRDGYNLHRVLTT 2 481CARTDAIRDGYNLHRVFDYW 2 482 CARGKRDAYNYYSHLDSW 2 483 CARGKEMPTITTLILTP 2484 IGHD3-16 19 CVRQSPLDDVWGVFAPVGSTP 11 485 CVRQSPLDDVWGVFAPVGSTL 2 486IGHD4-17 14 CARHPKPPTVTSATT 2 487 CAKGENTVTTGQEYW 2 488 CAKGENTVTTGQEY 2489 IGHD3-9 12 CAREGRNYDSLTGDPWFDPW 2 490 IGHD2-2 12CASRYCTSDRCLGASGKPSFDTW 2 491 CARHSLAYCSTTSCAVFDYW 2 492CARHGFEGREVVPPAMNEYYYYYMDVW 2 493 IGHD1-7 11 CARGDCTTINCNTHSDYYGLDVW 3494 CARTVGTGTTNGYLTS 2 495 CAREIVLLSTATLTPTTTVWTS 2 496 IGHD5-12 10CARQDSGYDYGYYHNGMDVW 2 497 IGHD4-23 9 CARGAGYGGNSGVRTT 6 498 IGHD2-21 9IGHD6-19 8 CARDLGSGWFRFDP 2 499 CARDLGSGWFGSTP 2 500 IGHD2-8 7CAKSHHCTNGVCHPPRFGQRSTP 2 501 IGHD1-26 6 IGHD1-20 2 (unidentified) 773CARGATVGVETGSTP 32 502 CARKGSRHGGSTP 28 503 CARQNGPSIGGGSTP 23 504CARGATPGAETGSTP 23 505 CAKDTLGGMGGLTS 13 506 CARVRVLPEGVLISLRPLGSTTITWTS11 507 CARGGPKKVVTAAHLSP 11 508 CSTLGLGPPGGQTT 10 509 CARDHYDTRGVRMLLIS10 510 CATDRDSSWGTSLTT 9 511 CARMVRGGGRTSSGYYYYYMDVW 9 512CARDGVWDLPTTLTT 9 513 CVRMGPPCQLAGRSSSLTS 6 514 CTMATVGHGLRRCFGKSTATLTS6 515 CARRGGSTVTTGTSIS 6 516 CSTLGLGPPGGLTT 5 517 CMGPGETAI 5 518CARVSMIRFRVWGLWTS 5 519 CARVQRGAVVIPTT 5 520 CARRRYNDLGAPNWVDPW 5 521CARGEDCGGGRCNNLPTTVWTS 5 522 CAKRKLAPPRKFTTLTT 5 523CATLEGGAPPDLRRAEAFLLIS 4 524 CARQDPYCSTSNCTMGGAMTLTT 4 525CARGKDCGGGRCNNVPYYGMDVW 4 526 CAKDGHKLTGTTTRTS 4 527 CAILPETQWYPRLTT 4528 CVRDLGAITPVFSTS 3 529 CVHRPRWLNVVPT 3 530 CVHRPRWLNVVPN 3 531CARSFVVKVHAHCGAVLSST 3 532 CARRLNVAVVVPAYVGWFDPW 3 533 CARLGKNHSQGVDYW 3534 CARGPGGVWDRLSLTS 3 535 CARGKDCGGGRCNNVPTTGWTS 3 536CARGFMVQASSVRLKRGQFLADSW 3 537 CARGDWGTVTLATT 3 538 CARDWEWQQRLNYFDP 3539 CARDNQPWRDARNLGGAFDVW 3 540 CARDGLRPPPFMVTIQRGGLTT 3 541CARAVGGFNSGWPSIGVPARSTP 3 542 CARAVGGFNSGWPSIGVPARSTL 3 543CAKVDETVVLPAALLTP 3 544 CAKSPKPWSQLVSTPIMPTPWTS 3 545 CVRRAAGGRSGLTT 2546 CVRPPPTVPGTAGSTP 2 547 CVRESTFYYFGPW 2 548CVRDDDYSRTWYMGQGASSDYGMDVW 2 549 CVKWVSGVLTSLTT 2 550 CVALFVPAGSTL 2 551CTMATVGHGATTLFREVHRNTDFW 2 552 CSTLGLGPRGADYW 2 553 CSRTGGRLLIS 2 554CSKVGRILKLIT 2 555 CKVAVEMVLMY 2 556 CGKFLGTTVASS 2 557CATSGRSSAWYPDVFDIW 2 558 CATNYCRGISCYPAPLTT 2 559 CATLTGGAPPDLRRAEAFLLIS2 560 CATEGTGAVTPFTT 2 561 CATAPGGTSYT 2 562 CASRPSWGSSFDFW 2 563CASRPPGAAALTS 2 564 CASMIALHHTLTS 2 565 CARYSPVDPSTLDFW 2 566CARVLDSSAHWYFDDW 2 567 CARRRYNDLGAPTGSTP 2 568 CARQNGPSIGGGSTL 2 569CARQHSEWEILRLVFDHW 2 570 CARMVREEAERRPAIIITTWTS 2 571 CARLPRMVRVTGSTP 2572 CARIDYVSTWYYDQW 2 573 CARICAEREFLSLLTP 2 574 CARGPGWGMGSTKFDCW 2 575CARGGKSATGANYHQFFDCW 2 576 CARGDCTTINCNTHSTTTVWTS 2 577 CARGATVGVETGSTL2 578 CARGATLGVETGWTP 2 579 CAREYYGILYGYYFDYW 2 580 CARDWEWQQRLNYFDPW 2581 CARDNQPWRDARNLGVHLMC 2 582 CARDHYYDERNQGPDW 2 583 CARDGGLAGTGTLEY 2584 CARAGLVLGPYGMDIW 2 585 CARAGGHGTWTS 2 586 CAKVAETLVSTGFDSYYAYSMDVW 2587 CAKTYDYGSRGFSILLIS 2 588 CAKSLRVGGDVFEIW 2 589 CAKSDYFDP 2 590CAKGRGRLVTIATTLTT 2 591 CAKGAGRRAAGKFLTT 2 592CAKAKRRSLGMQTLPTLRGRSDGFDVW 2 593 CAKAHFPGDLPSFSSIS 2 594CAKADCGTGCFIVDDW 2 595 CAHQQWRPGRRGFDYW 2 596

[TABLE 1-1H5] IgM num num SEQ ID gene name reads CDR3 amino acids readsNO: IGHD6-13 148 CARTYSSWYRGPLSP 24 597 CTRQEESSAAGTGGTSSP 16 598CARPIAAAGSRGFGTLTT 15 599 CAQRRPSSSTWYAPTLTT 7 600 CAQRRPNSSTWYAPTLTT 4601 CARDLGGYSSSWSTNYYYYMDVW 3 602 CAKVNWGIAAAGSYAFDIW 3 603CAHRVRGMTSSSWYYGTFDYW 3 604 CVRPGATAGTLLTV 2 605 CARTYSSWYRGGPLSP 2 606CARPQRYSSSWYDDYYYGMDVW 2 607 CARPIAAAGSRGVRYFDYW 2 608CARGVLAPLYSSTLKLRFSVWTS 2 609 CARGLVAAAGTRRGWFTP 2 610CARGLVAAAGTRRGWFDPW 2 611 CARDSGQIVAAVTLDYW 2 612 CARAPSIPVAGIGYHFDHW 2613 CAQRRPNSSTWYRPLTLTT 2 614 CAKVNWGYSSCWFLRFLIS 2 615 CAKEGVPIAAPGLTT2 616 CAHSRAAAGSLTT 2 617 CAHSRAAAGSFDYW 2 618 CAHRVRGNDKQQLVLWGPLTT 2619 CAHRVRGMTSSSWYYGTLTT 2 620 IGHD2-15 123 CARVEGGLTT 15 621CAKGWTAARGALNTSST 14 622 CARFWSGVGLTT 7 623 CAREVYLYCNGGRCYWRGSSP 5 624CARSEYCRGGNCYFNGYYFDSW 3 625 CARAPYCSGGSCYLFDYW 3 626CARGFVVVVTATLGTTITPWTS 2 627 CARDLCGGSCSRTTGSTP 2 628CARAGYCSGGSCYGWFDPW 2 629 CAKTKTGTTKINTTLTT 2 630 CAKNGILTGWVNGYTTLTT 2631 CAKGQTTAILGSTDFNWFDPW 2 632 CAKFHLQPSLLMVRRSPTS 2 633 IGHD3-22 117CARDPRGAVGITTGPTH 9 634 CARGSPPGAVGFIGSTP 8 635 CAKDMGGITMIVVVMISLTT 6636 CARARRGHGSTTTWTS 5 637 CARDPPRMLLIS 4 638 CARDLSYYDSSGYYAYW 4 639CATYKYYDSSGFMTT 3 640 CATYKYYDSSGFHDYW 3 641 CARDLSYYDSSGYYAYV 3 642CAHRRPYYYDSSGYYYAFDYW 3 643 CLTMIVPT 2 644 CATVTGGSSGYYYHVYYFDYW 2 645CARVRYSGGILGLPLTT 2 646 CARGRRGIVVVIPKEVRFDYW 2 647 CARGIWVSTGYYRYYFDNW2 648 CAREGYDSGGYYYEVEAFDIW 2 649 CARDAGPITTTVAGIIMRLLTF 2 650CAHRRPITMIVVVITMPLTT 2 651 IGHD5-5 89 CARGGGKDLLASYLTT 19 652CTSRGYSYGAPRWD 7 653 CARRWGRGRDTAMNLTTTTVWTS 7 654 CSRGGPGTAMVST 3 655CARRGGGGGDTAMNLTTTTVWTS 3 656 CARHGDSFVQPRRTT 3 657 CAKHDGQSNTLTA 3 658CATNTAMGFNEAVLIS 2 659 CARRGYSSMGIWDLST 2 660 CARRGGRGRDTAMNLTTTTVWTS 2661 CARQGSRLFHYYYYYMDVW 2 662 CARHKPGYSYVFLTT 2 663 IGHD3-10 78CGREGAGSAPWTS 5 664 CARHRITMVRELSYTTTWTS 5 665 CARDSDQQHGVRGVIPMAVWTS 5666 CATTPLMTLVRGLTTTWTS 4 667 CARHQVSMVRGVTRSTGSTP 4 668CARDEWFGESEVTNLDAFDIW 3 669 CAHSEGRITMVRGVIGPFDYW 3 670CARGQTYGSGPRGFDPW 2 671 CARGLYGSGSYYIKRRKTGSTP 2 672 CARAMVRGVLALTT 2673 CAKVGVGSMTMDRGVMTT 2 674 IGHD3-3 77 CARDPFGVVISTVWTS 14 675CARDRGRVLRFCPQGVPSLTT 8 676 CASQTYYDFGVVIILLTTLTT 5 677 CASLDFGVVIILTS 4678 CARHRSITILEWFVNHETGSTP 4 679 CAQSHYDFGVVIILIPGSTP 4 680CARGSPHYDFGVEIRTGSTP 3 681 CARDPLWSGYFYGMDVW 3 682 CARVGTYDFGVVMSNS 2683 CARHRSITIFGVVRKSRNWFDPW 2 684 CARDRFSLNSAFGVVEGSYWFDPW 2 685CAQSHYDFWSGYYSNTGFDPW 2 686 IGHD1-26 51 CASVGGTRGPGDPGLGT 12 687CAKGGFIVGATLTT 4 688 CAKEGGRIVGATMTT 3 689 CARVRYSGRYSRSTVDYW 2 690CARLKCGLTTCLHKTLIS 2 691 CARDSVGATTTDYW 2 692 IGHD6-19 50CAHPGSGWPLTTLTT 6 693 CARGCSVAGTGSSTP 4 694 CARARITVAAPYDYW 4 695CARLISSGWYLTT 3 696 CARTSLEQQLVFMTENSSGWSFDYW 2 697 CARGGIAVAGTRIKTTT 2698 CARDQQWLPDYV 2 699 CARARITVAAPYDY 2 700 CARARITVAAPMTT 2 701CAKGVGSGWYDFFDYW 2 702 CAKGPREQWLAPYWYFDLW 2 703 IGHD3-9 39CARGGSLVLDVLTT 17 704 CARGGSLMLDVLTT 10 705 CASGPYFDWLLTYMDVW 2 706CARGPLYDILTGPTPTTTTTWTS 2 707 CARGGSIVLDVLTT 2 708 IGHD2-8 30CAKWGGNSSWKS 7 709 CARRSWCTNGVCYYISVALVTGSTP 6 710 CARGSRYCTNGVCYFWFDPW3 711 CARDVLGYCTATACWRGGPNHYYYGMDVW 3 712 IGHD5-24 28 CARGIEMATILLTT 16713 CARGSRWLQFFDYW 3 714 CAKGGERWLQSGATTLTT 2 715 IGHD6-6 22CTRGLVIEDIAARPGGA 2 716 CASDRGVQLVQDYYFGMDVW 2 717 IGHD5-12 21CARNARGGVATIFRGSTP 8 718 CARIQVATIDPKPKRLPSVWTS 2 719 IGHD4-17 19CARDWNGDYDYYYYGMDVW 6 720 CARDWNGDYTTTTTVWTS 2 721 IGHD2-2 16CARDRSSTSCCHFDYL 2 722 IGHD2-21 13 CARGPAYCGSDCYSYFQHW 2 723 IGHD4-23 9CARGGDYGGTPLTT 6 724 IGHD1-7 6 CARDGPPRITGTTEVTT 3 725 IGHD1-1 6CARRVGASGTSIS 4 726 IGHD3-16 4 CARAHYDYVWGSYRSPPTT 2 727 IGHD1-20 4IGHD4-4 3 CARPVTTGTHRGYFDLW 2 728 (unidentified) 834 CASVGGTRVPGDPGLGT35 729 CAHLTITFGEFSERMLSTS 29 730 CARLGYYDRRTT 27 731 CAGEVVIWNSMTT 18732 CARGARGDNSTMT 15 733 CARGGSRWPRTTLTT 13 734 CARMGGPPTGTSIS 12 735CVRGGLYTIPT 11 736 CARGGCGNYCPTTTSWTS 11 737 CARRDSSRGTTLTT 10 738CARTTGTTTTTTWTS 8 739 CARLSRYSNSPPSLTT 8 740 CARHLGVRGPWALFIS 7 741CARDPPRMLLIS 7 742 CAKGDIVTT 7 743 CARGGGVSSRRITSTP 6 744 CAREGVRSLTT 6745 CAKDKTYDTHGYSPF 6 746 CASLLLPTVTGGVLLIS 5 747 CARDYGATGSLDC 5 748CARDFGSGGVLITWPS 5 749 CARYPGIEVTGTGALTT 4 750 CARRGDVGNYCPTTTSWTS 4 751CARLPGITTTTTTWTS 4 752 CARHVKPVDGNAYYEDSV 4 753 CARGTRGISEPTKFDYW 4 754CARGGPERQLDDS 4 755 CAHRRPDSSTWYAPTLTT 4 756 CVSRRQTTPTSTVGPS 3 757CVRKEVMYFDP 3 758 CGDTLGETMPVTA 3 759 CATRRGQFWTT 3 760CARVVGGGVTTTTTVWTS 3 761 CARVLLSGSTWYAEYFQSW 3 762 CARTLSATGDNWFGPW 3763 CARTGARGDNSTMTS 3 764 CARQTPGTLQTTTTTTVWTS 3 765 CARPRYDYGLLLIS 3766 CARLTRRTTVVPRTSTT 3 767 CARHVKPVDGNAYYEDSW 3 768 CARHRGVRGPWALFIS 3769 CARGSPPGAVGFIGSTP 3 770 CARGLSSSRSLSSTP 3 771 CARGGATPGG 3 772CAREVPTGPRTSTTVWTS 3 773 CARDPRADYLAFDIW 3 774 CANGDTARPTGTLAT 3 775CAKAPSDTIIVHGPQHLTT 3 776 CAARGRTTLTT 3 777 CVRGSGRTGEAT 2 778CVREARTPATTYGWYYYDYW 2 779 CVRDNSWSSRDAERYYYNMDVW 2 780CVRDLAWRTQQLLSENWFDPW 2 781 CVRDLAWRTQQLLSEIGSTV 2 782CVRDLAWRTEQLLSENWFDPW 2 783 CVRDLAWRTEELLSENWFDTW 2 784CVRDLAWRTEELLSENWFDPW 2 785 CVRDLAWRTEELLSEIGSTL 2 786 CVHRPRWLNIVANV 2787 CTWWQQLGEFLTS 2 788 CTSLTSMVNFMLLMS 2 789 CTRQEESSAAGTGGTSSP 2 790CTRDGVRGDLNPTLNV 2 791 CMRHQHQRPRTT 2 792 CITDCTGGSCDFAGPGEYW 2 793CATYYYKLVVIDTLTT 2 794 CATGAATVLLTT 2 795 CASRPGHHSGPLTT 2 796CASRPGHHSGPFDYW 2 797 CASPVGGGET 2 798 CARWPPIQGELLIS 2 799CARVRSGLLPTTTTTWTS 2 800 CARVQLIGDSGYRPWTT 2 801 CARVLRGPTTLTT 2 802CARQWGIRGVALTT 2 803 CARPRYDLRFCLLIS 2 804 CARNTEATTT 2 805CARMPGKEIAMADLATLTT 2 806 CARLTRRTTVGTPDIDYV 2 807 CARHVKPVDGNAYYEDS 2808 CARHLVW 2 809 CARHDPVPQFKHGWTS 2 810 CARGGPGRQLTMT 2 811CARGGGKDLLASYLTT 2 812 CARGARSGSSMTA 2 813 CARDYGATGSLDCW 2 814CARDVIGAAASYVAFDIW 2 815 CARDEWFGSPKSRTLMLLIS 2 816 CARAQNWDLLTGTSIS 2817 CARAPSIPVAVSATTLTT 2 818 CAKHDGQSNTPDCW 2 819 CAKGWTAARGALNTSST 2820 CAKGPPVVTTLDTSST 2 821 CAKDRGGS 2 822

TABLE 1-1H6 IGHA1 num num SEQ ID gene name reads CDR3 amino acids readsNO:  IGHD3-22 35 CARRPIPPLTMRVVVIPLTS 5 823 CARDPPMPMIVVQTLTT 2 824CARDPPMPMILVQTLTT 2 825 CAKILITMILVVSLMLLIS 2 826 IGHD6-13 27CGRSRHSSSWQILTP 11 827 CANGGLAAAGDHLTT 5 828 CARAPSIPVAGIGYHFDHW 3 829CARAPSIPVAGIATTLTT 2 830 IGHD3-10 26 CAREIRGTTMVRELTTSTATWTP 6 831CVRTYYFGLGDIITEITSTVWTS 3 832 CARRTYYYGSTNLTT 2 833 CARMVTDYYGSGNRGWFDPW2 834 CARDYYGSGVMTL 2 835 IGHD5-5 19 CMGPGDTAI 7 836 CARRPREMESAMVLSLTT2 837 IGHD3-9 19 CAHSAPYYDILSRNRARSWKDFDNW 3 838CAHSAPIMIFCLVTAHEVGRILTT 3 839 CATVALLRYFDWSSTR 2 840 IGHD3-3 17CTRRGGVVIICLTT 2 841 CIHTGNDFWTGTNYGLTS 2 842 CAKDRFSGRGRFEFMEWLTPLTT 2843 CAKDRFSEGKVQFMEWLTPLTT 2 844 IGHD6-19 14 CARGRRSLPAIVYSSGPDRPNWFDPW6 845 CTSAAVASSSGWPLRGVWTS 3 846 IGHD2-2 14 CARAPLCSSASCHLQLDYW 5 847IGHD1-26 13 CARDDSASYSRGTT 3 848 IGHD2-21 12 CARSHIVVVTAIPLEMLLIS 2 849IGHD4-23 11 CARGAGYGGNSGVRTT 9 850 IGHD2-15 11CAKDLAPLKSCSRGGCYPYYYGLDIW 3 851 IGHD4-17 9 CARTLYGDFVDF 2 852 IGHD6-6 5IGHD5-12 4 CARHVNGYDYLFPFTSW 3 853 IGHD3-16 4 CAKGVLSSGGVIATLPGSTP 3 854IGHD1-1 3 CARGGSQLERRRPLVTT 3 855 IGHD5-24 2 IGHD2-8 2 (unidentified)531 CARETVGGTLTT 19 856 CARISSGHDPPIITGWTS 13 857 CARSPIWFGSHRFTTTWRS 9858 CARDPLETGATSLII 9 859 CAKLGNRPGFTEWDHWFGPW 9 860 CVRDPHETGATTLIT 6861 CARIRKEVGAPPITWTS 6 862 CARGSWSGAAFYSLTT 6 863 CARDPNKFRTNHLSTT 6864 CATVPELTDISLPRLMALIS 5 865 CARVWGKHTLTT 5 866 CARDPNKFRPNHLSTT 5 867CARAGRELLRALMTT 5 868 CARAGAELLRALMTT 5 869 CARAEDYYDTEGYFYLTP 5 870CAHRTNYSTNRYGAFTTLTS 5 871 CVRDPQETGATTLIT 4 872 CATQCLGGAGLTTTTAPWTS 4873 CARRTYYSGSTNLTT 4 874 CARRTTRETGSSIS 4 875 CVRQYGLGSGSLTP 3 876CVKIRNLIGFTGSTP 3 877 CTRDGVRGDLNPTLNV 3 878 CDKAKVTADLRT 3 879CATVPELPDISLPRLMALIS 3 880 CATVFGRRYRLLTT 3 881 CARYRAAYPRRAWTS 3 882CARTIGFEIAMTGGLGALTP 3 883 CARRDPPVRASLSTTLTS 3 884CARFQRYCRGGSCSATLDAFDKW 3 885 CARDLGERRDGEPTNWFDAW 3 886 CARDLAVWATLTT 3887 CAKDVEPTVTLYNHFDP 3 888 CAKDFNWEGIT 3 889 CAHRTNYSTNRYGGLYYFDFW 3890 CVRGVGTILWLTI 2 891 CVRDAGPGGSLTS 2 892 CTTGFSGSTACHWDHTACHWDDAFAMW2 893 CTHAVESLLGTTSTS 2 894 CIHTGNDFGPGPTMVWTS 2 895 CGVGRGDNDVDFKFKW 2896 CATRESPLTT 2 897 CATAGIELWRAGSTP 2 898 CARYRIAMATSPYFDYW 2 899CARTNFGSGGYILGDTTMVWTS 2 900 CARSAGYLHRRTS 2 901 CARRTYYSGSTNFDYW 2 902CARRDLPFGASLSTTLTS 2 903 CARPGFSYGPRLTP 2 904 CARKKIPTAGYSSLTT 2 905CARGSWMGRPFISLTT 2 906 CARGLRWADN 2 907 CARGGTSGLILDTTSTPWTS 2 908CAREMHIDSLTVGRAFDIW 2 909 CARDVPDIYSSGATDC 2 910 CARDPSYLPTPALKT 2 911CARDPNKFRPNHFVDYW 2 912 CARDLGTTNYWLDTW 2 913 CAKQRASGNSLTI 2 914CAKEPKIVGRRRTTLIT 2 915 CAKDLGVCSEGAASSLVLIS 2 916CAHSAPYYDICLVTAHEVGRILTT 2 917 CAGLIGRFIPLTT 2 918

TABLE 1-1H7 IGHA2 num num SEQ ID gene name reads CDR3 amino acids readsNO:  IGHD2-21 62 CAKDMCGLWASCGGDCYSRRTTSLTT 41 919CAKDMCGLWASCGGDCYSRRTASLTT 5 920 CARGPNMAFVVVTAILMLLIS 4 921CARAPDCGGSTCYSHPYYGMDVW 4 922 CARDPRIVVVAPATHTPTTVWTS 2 923 IGHD3-3 20CIYDFWSGGPHPTLTT 11 924 CARIVNTEGFWSGFLTP 4 925 CARIVNTEGFGVVFLTP 2 926IGHD2-15 18 CARAPDCGGGTCYSHPTTVWTS 5 927 CARARIVVVVPATLTPTTVWTS 3 928CARAPDCGGGTCYSHPYYGMDVW 3 929 IGHD6-13 14 CALCPTPIAAAGSVTT 5 930CALCPNPYSSGWFCNYW 3 931 IGHD1-26 9 CARGPATAILGATPSLTP 3 932CVRHDYSDNDLSTNWFGP 2 933 IGHD3-10 7 CATYYYGSGSAGHNFDYW 2 934CARGPGLSVMIRGVITTPNHILIT 2 935 IGHD2-8 6 CANVGGADRNYCINGVRHNPNYLTT 5 936IGHD3 -16 5 CARGFGARGVILT 2 937 IGHD5 -12 1 CVLSRGLVATRTLDYW 1 938IGHD4-17 1 CARTLYGDFVDSL 1 939 IGHD3 -22 1 CARDKQESSGSPRNYYFDYW 1 940(unidentified) 131 CAKGHQVRLRGRTGTSIS 11 941 CAGAPDCGVGAAPLTSTTVCTS 8942 CAGIGGATSTTTTTTWTS 6 943 CARRAAPHDYGHVLIF 5 944 CVRHDGSFTKTGSTP 4945 CVKIGAAH 4 946 CARLRCSNDNCAGHLYYYFSGLDIW 4 947 CARASLPRGLLIS 4 948CTKGGGRKTAGKFLTP 3 949 CARTARTGDL 3 950 CARIGHEFYSLTYSVNDVFDLW 3 951CAKGRGRRAAGKFLTT 3 952 CAKGAGRRAAGKFLTT 3 953 CVRFIGAYSNNWYPGYFDYW 2 954CASQSQNYYYYYMDVW 2 955 CASKKEILWAGPNLTT 2 956 CARVRCGLVASEGVLIS 2 957CARRAAPMTTGMFLIF 2 958 CARLRGGFPPVVKRVEVFLLTS 2 959 CARGRFARGGDDSLIS 2960 CAKAPGDLCRSTP 2 961 CAGIRGSNIYYHYYYMDVW 2 962 CADLPGIIGGEIT 2 963

TABLE 1-1H8 IGHG1 num num SEQ gene name reads CDR3 amino acids readsID NO:  IGHD3-22 52 CAKITSMIVVLIPTMMLLMS 20 964 CARGSRARFSSDTSGYQYFDYW 4965 CARGVYLYYDSHAYSVLTT 3 966 CARVNYYDSVVLTT 2 967 CARVNYYDSSRIDYW 2 968CARLPPFNNDDSSSYALYLTT 2 969 CARHSNYYYDTSGYRVLDAFDIW 2 970CARGGMDSYGYFYVGHYDYW 2 971 CAKITSMIVVLTPTMMLLMS 2 972 IGHD3-10 35CARLPRMVRGNWFHP 8 973 CARGAWAVRGVISWAGSTP 6 974 CAGSGSGSLLTTVWTP 4 975CVSITNSLLWFGELLIFDCW 2 976 IGHD6-13 31 CTRQEESSAAGTGGTSSP 7 977CALCPTPIAAAGSVTT 7 978 CATSEGDPVAAAGTKSWFDSW 3 979 CARLALLYGSSRYGATLTT 2980 CARGPSSTWYSFDYW 2 981 IGHD3-3 20 CAHSVGFILDFWSGYQNNWFDPW 4 982CAMGPTIFGVVFLGSLTS 2 983 CAHSVGFILDFWSGYQNNWFDPG 2 984CAHSVGFILDFGVVIRTTGSTP 2 985 IGHD3-16 19 CVRQSPLDDVWGVFAPVGSTP 11 986CVRQSPLDDVWGVFAPVGSTL 2 987 IGHD5-5 18 CARGVDTTMVRSTTLTT 7 988CARQDPYCSTSNCTMGGAMTLTT 5 989 IGHD5-24 14 CARTDGIRDGYNLHRVLTT 2 990CARTDAIRDGYNLHRVFDYW 2 991 IGHD3-9 11 CAREGRNYDSLTGDPWFDPW 2 992 IGHD1-711 CARGDCTTINCNTHSDYYGLDVW 3 993 CARTVGTGTTNGYLTS 2 994CAREIVLLSTATLTPTTTVWTS 2 995 IGHD4-17 10 CARHPKPPTVTSATT 2 996 IGHD5-129 CARQDSGYDYGYYHNGMDVW 2 997 IGHD2-2 9 CARHSLAYCSTTSCAVFDYW 2 998CARHGFEGREVVPPAMNEYYYYYMDVW 2 999 IGHD2-15 9 CALTGLNGRSCYSELLIS 2 1000IGHD4-23 7 CARGAGYGGNSGVRTT 6 1001 IGHD1-26 5 IGHD2-21 4 IGHD2-8 3(unidentified) 444 CARGATVGVETGSTP 32 1002 CARKGSRHGGSTP 28 1003CARQNGPSIGGGSTP 23 1004 CARGATPGAETGSTP 23 1005 CAKDTLGGMGGLTS 13 1006CARVRVLPEGVLISLRPLGSTTITWTS 11 1007 CATDRDSSWGTSLTT 9 1008CARRGGSTVTTGTSIS 6 1009 CARQDPYCSTSNCTMGGAMTLTT 4 1010 CAILPETQWYPRLTT 41011 CVHRPRWLNVVPT 3 1012 CVHRPRWLNVVPN 3 1013 CARLGKNHSQGVDYW 3 1014CARGFMVQASSVRLKRGQFLADSW 3 1015 CARGDWGTVTLATT 3 1016CARDNQPWRDARNLGGAFDVW 3 1017 CARDGLRPPPFMVTIQRGGLTT 3 1018CARAVGGFNSGWPSIGVPARSTP 3 1019 CARAVGGFNSGWPSIGVPARSTL 3 1020CAKSPKPWSQLVSTPIMPTPWTS 3 1021 CVRESTFYYFGPW 2 1022CVRDDDYSRTWYMGQGASSDYGMDVW 2 1023 CVKWVSGVLTSLTT 2 1024CATSGRSSAWYPDVFDIW 2 1025 CATNYCRGISCYPAPLTT 2 1026 CASMIALHHTLTS 2 1027CARYSPVDPSTLDFW 2 1028 CARVLDSSAHWYFDDW 2 1029 CARQNGPSIGGGSTL 2 1030CARQHSEWEILRLVFDHW 2 1031 CARLPRMVRVTGSTP 2 1032 CARIDYVSTWYYDQW 2 1033CARICAEREFLSLLTP 2 1034 CARGDCTTINCNTHSTTTVWTS 2 1035 CARGATVGVETGSTL 21036 CARGATLGVETGWTP 2 1037 CAREYYGILYGYYFDYW 2 1038CARDNQPWRDARNLGVHLMC 2 1039 CARDGGLAGTGTLEY 2 1040 CARAGLVLGPYGMDIW 21041 CAKVAETLVSTGFDSYYAYSMDVW 2 1042 CAKTYDYGSRGFSILLIS 2 1043CAKGAGRRAAGKFLTT 2 1044 CAKAKRRSLGMQTLPTLRGRSDGFDVW 2 1045CAKADCGTGCFIVDDW 2 1046

TABLE 1-1H9 IGHG2 num num SEQ ID gene name reads CDR3 amino acids readsNO:  IGHD3-10 24 CARGRYAGGVIITALTP 13 1047 CSREVGRDYYGSGVIEITWTS 4 1048CSREVGRDYYGSGSYRNYMDVW 3 1049 IGHD2-15 22 CAKKEFILVVVITMMSLLMS 6 1050CAKDMTAKACSDYW 3 1051 CARVMGCRGGRCDFRAFDIW 2 1052 CARRFCSGGICYFLTT 21053 CAKEGVYFSGGNHYDVAFNVW 2 1054 IGHD6-25 21 CARVKGGIAGMAWTS 19 1055IGHD6-19 8 CARDLGSGWFRFDP 2 1056 CARDLGSGWFGSTP 2 1057 IGHD3-3 8CARPSRCCYSGGGRLTL 4 1058 CARPSRCCYVRGGRLTL 2 1059 IGHD5-24 7CARGKRDAYNYYSHLDSW 2 1060 CARGKEMPTITTLILTP 2 1061 IGHD4-17 4CAKGENTVTTGQEYW 2 1062 CAKGENTVTTGQEY 2 1063 IGHD3-22 4 CARDPDF 2 1064IGHD2-8 4 CAKSHHCTNGVCHPPRFGQRSTP 2 1065 IGHD6-13 3 IGHD5-5 3 IGHD2-21 3IGHD2-2 3 CASRYCTSDRCLGASGKPSFDTW 2 1066 IGHD4-23 2 IGHD1-20 2(unidentified) 317 CARGGPKKVVTAAHLSP 11 1067 CSTLGLGPPGGQTT 10 1068CARDHYDTRGVRMLLIS 10 1069 CARMVRGGGRTSSGYYYYYMDVW 9 1070 CARDGVWDLPTTLTT9 1071 CTMATVGHGLRRCFGKSTATLTS 6 1072 CVRMGPPCQLAGRSSSLTS 5 1073CSTLGLGPPGGLTT 5 1074 CMGPGETAI 5 1075 CARVSMIRFRVWGLWTS 5 1076CARVQRGAVVIPTT 5 1077 CARRRYNDLGAPNWVDPW 5 1078 CARGEDCGGGRCNNLPTTVWTS 51079 CAKRKLAPPRKFTTLTT 5 1080 CATLEGGAPPDLRRAEAFLLIS 4 1081CARGKDCGGGRCNNVPYYGMDVW 4 1082 CAKDGHKLTGTTTRTS 4 1083 CVRDLGAITPVFSTS 31084 CARSFVVKVHAHCGAVLSST 3 1085 CARRLNVAVVVPAYVGWFDPW 3 1086CARGKDCGGGRCNNVPTTGWTS 3 1087 CARDWEWQQRLNYFDP 3 1088 CVRRAAGGRSGLTT 21089 CVRPPPTVPGTAGSTP 2 1090 CVALFVPAGSTL 2 1091CTMATVGHGATTLFREVHRNTDFW 2 1092 CSTLGLGPRGADYW 2 1093 CSRTGGRLLIS 2 1094CSKVGRILKLIT 2 1095 CKVAVEMVLMY 2 1096 CGKFLGTTVASS 2 1097CATLTGGAPPDLRRAEAFLLIS 2 1098 CATEGTGAVTPFTT 2 1099 CATAPGGTSYT 2 1100CASRPSWGSSFDFW 2 1101 CASRPPGAAALTS 2 1102 CARRRYNDLGAPTGSTP 2 1103CARMVREEAERRPAIIITTWTS 2 1104 CARGPGWGMGSTKFDCW 2 1105 CARGPGGVWDRLSLTS2 1106 CARGGKSATGANYHQFFDCW 2 1107 CARDWEWQQRLNYFDPW 2 1108CARDHYYDERNQGPDW 2 1109 CARAGGHGTWTS 2 1110 CAKSLRVGGDVFEIW 2 1111CAKSDYFDP 2 1112 CAKGRGRLVTIATTLTT 2 1113 CAKAHFPGDLPSFSSIS 2 1114CAHQQWRPGRRGFDYW 2 1115

FIGS. 7 (A-D) show a V region repertoire for each isotype. A repertoireof a V region sequence for each isotype (BCR V repertoire) is shown. BCRV repertoires were very similar among IgM, IgG, IgA, and IgD, but only aread having IGHV3-30 was obtained for IgE. A reason therefor issuggested to be the possibility that there are much fewer number of IgEpositive cells in peripheral blood relative to other classes andtherefore a biased repertoire was detected.

FIGS. 8 (A-D) show a V region repertoire for each subtype. A BCR Vrepertoire is shown for each IgA and IgG subclass. The IgA subclass haddifferent frequencies in several types of V chains between IgA1 andIgA2. The frequency of presence of IGHV1-18 and IGHV4-39 was higher inIgA1 compared to that in IgA2, while the frequency of presence ofIGHV3-23 and IGHV3-74 was higher in IgA2 than that in IgA1. For the IgGsubclass, the frequency of IGHV3-23 and IGHV3-74, which were found to beincreased in IgA2, was higher in IgG2 compared to that in IgG1. Therewere few reads for IgG3 and IgG4 (10 reads). The frequency of cloneswith IGHV4-59-1GHJ4-IGHD1-7 was 3/10 in IgG3, thus having highclonality. Reads with IGHV3-23-IGHJ4-IGHD3-10 accounted for 5/10 forIgG4 (Table 1-3).

TABLE 1-3 CDR3 amino acid sequence of 8CR read V J D AA JUNCTIONfrequency IgG3 IGHV4-59 IGHJ4 IGHD1-7 CARVVGNWNYEWIFDNW (SEQ ID NO: 20)3/10 IGHV3-30-3 IGHJ3 IGHD5-12 CARMYRRVYGFDAW (SEQ ID NO: 21) 2/10IGHV1-18 IGHJ4 IGHD2-21 CARRHYGDRGYYFDIW (SEQ ID NO: 22) 1/10 IGHV1-2IGHJ3 IGHD6-13 CVRDRLPSWAAAGKDSFGLW (SEQ ID NO: 23) 1/10 IGHV4-34 IGH36IGHD3-10 CARGRKLPVRGVRGMFYYYGVDVW (SEQ. ID NO: 24) 1/10 IGHV4-39 IGHJ6IGHD6-19 CARQIVRNSGWYVALDLW (SEQ ID NO: 25) 1/10 IGHV4-59 IGHJ3 IGHD3-22CARQSLYRIYYSDSSGYRLDAFDIW (SEQ ID NO:  26) 1/10 IgG4 IGHV3-23 IGHJ4IGHD3-10 CAKYLMILGHFDIW (SEQ ID NO: 27) 5/10 IGHV2-70 IGHJ4 IGHD6-13CARLLGSGWYHFDKW (SEQ ID NO: 28) 2/10 IGHV1-69 IGHJ4 IGHD5-24CARGRPSRDGYRPPMYYFLDYW (SEQ ID NO: 29) 1/10 1011V3-21 IGHJ3 IGHD2-2CARGCSANCPTVAFDLW (SEQ ID NO: 30) 1/10 IGHV3-23 IGHJ5 IGHD6-19CAKDRGNSGWWSWLDPW (SEQ ID NO: 31) 1/10

FIG. 9 shows a BCRJ repertoire for each subclass. IGHJ4 was used inabout half of the reads in IgM, IgG, IgA and IgD, while IGHJ2 was hardlyused. Only IGHJ1 was used in IgE. An IGHJ repertoire in subclasses ofIgM and IgA was also studied. A significant difference among subclasseswas not observed unlike an IGHV repertoire.

The above results demonstrated that unbiased quantitative analysis ispossible with the sample providing method of the present invention.

Preparation Example 2: Analysis of TCR Repertoire in Peripheral Blood ofHealthy Individuals

The present Example performed TCR repertoire analysis on peripheralblood of healthy individuals.

(Materials and Methods)

(Sample)

Peripheral blood mononuclear cells of 10 healthy individuals

(Method)

(1. RNA Extraction)

5 mL of whole blood was collected from 10 healthy individuals in aheparin-containing blood collection tube. Peripheral blood mononuclearcells (PBMC) were separated by ficoll density gradient centrifugation.Total RNA was extracted/purified from the isolated PBMCs by using anRNeasy Lipid Tissue Mini Kit (Qiagen, Germany). The resulting RNA wasquantified by using an Agilent 2100 bioanalyzer (Agilent). The amount ofacquired RNA is shown in the following Table 1-4.

TABLE 1-4 Amount of RNA Amount of RNA Concentration Sample elution (μL)(ng/μL)  # 1 30 1682  # 2 30 274  # 3 30 1007  # 4 30 560  # 5 30 988  #6 30 1327  # 7 30 667  # 8 30 258  # 9 30 597 # 10 30 624

(2. Synthesis of Complementary DNA and Double Stranded ComplementaryDNA)

The extracted RNA sample was used to carry out adaptor-ligation PCR. Themethod was carried out in accordance with the method shown inPreparation Example 1. Specifically, a BSL-18E primer (Table 1-5) andRNA were admixed and annealed, and then a reverse transcriptase was usedto synthesize a complementary strand DNA. A double-stranded DNA wassubsequently synthesized. Furthermore, T4 DNA polymerase was used toperform a 5′ terminal blunting reaction. After column purification by aHigh Pure PCR Cleanup Micro Kit (Roche), a P20EA/P10EA adaptor was addedin a ligation reaction. An adaptor added double stranded complementaryDNA purified by a column was digested by a NotI restriction enzyme.

TABLE 1-5 Primer sequence Primer Sequence BSL-18EAAAGCGGCCGCATGCTTTTTTTTTTTTTTTTTTVN (SEQ ID NO: 32) P20EATAATACGACTCCGAATTCCC (SEQ ID NO: 33) P10EA GGGAATTCGG (SEQ ID NO: 34)CA1 TGTTGAAGGCGTTTGCACATGCA (SEQ ID NO: 35) CA2GTGCATAGACCTCATGTCTAGCA (SEQ ID NO: 36) CB1GAACTGGACTTGACAGCGGAACT (SEQ ID NO: 37) CB2AGGCAGTATCTGGAGTCATTGAG (SEQ ID NO: 38)

(3. PCR)

1^(st) PCR amplification was performed for a first PCR amplificationreaction product from a double stranded complementary DNA by using acommon adaptor primer P20EA and a TCRα chain or β chain C regionspecific primer (CA1 or CB1) shown in Table 1-1. PCR was performed for20 cycles of a cycle of 30 seconds at 95° C., 30 seconds at 55° C., andone minute at 72° C. with the following reaction composition.

TABLE 1-2A 1^(st) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10 10 mM Tris-HCl (pH 8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EA 0.5250 nM primer 10 mM CA1 or 0.5 250 nM CB1 primer Double stranded 2complementary DNA Sterilized water 7

A 1^(st) PCR amplicon was then used to perform 2^(nd) PCR with thereaction composition shown below by using a P20EA primer and a TCRαchain or β chain C region specific primer (CA2 or CB2). 20 cycles of PCRwere performed, where a cycle was 30 seconds at 95° C., 30 seconds at55° C., and one minute at 72° C.

TABLE 1-2B 2^(nd) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10 10 mM Tris-HCl (pH 8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EAprimer 1 500 nM 10 mM CA2 or CB2 1 500 nM primer 1st PCR amplicon 2Sterilized water 6

A primer was removed with a High Pure PCR Cleanup Micro Kit (Roche) froma 2^(nd) PCR amplicon, which is a product obtained from the second PCRamplification reaction. Furthermore, analysis was carried out withRoche's next generation sequence analyzer (GS Junior Bench Top system),with the 2^(nd) PCR amplicon diluted 10 fold as a template.Amplification utilized a B-P20EA primer, which is a P20EA adaptor primeradded with an adaptor B sequence, and HuVaF-01-HuVaF10 (α chain) andHuVbF-01-HuVbF-10 chain), which are TCRα chain or β chain C regionspecific primers added with an adaptor A sequence and each MID Tagsequence (MID-1 to 26) shown in FIG. 10. The primer sequences used areshown in Table 1-6. 10 cycles of PCR were performed, where a cycle was30 seconds at 95° C., 30 seconds at 55° C., and one minute at 72° C. Toconfirm PCR amplification, 10 μL of amplicon was amplified by 2% agarosegel electrophoresis (FIG. 11).

TABLE 1-61 Sequencing primer Primer Sequence MID tag HuVaF-01CCATCTCATCCCTGCGTGTCTCCGACTCAGACGAGTGCGTATAGGC MID-1AGACAGACTTGTCACTG (SEQ ID NO: 40) HuVaF-02CCATCTCATCCCTGCGTGTCTCCGACTCAGACGCTCGACAATAGGC MID-2AGACAGACTTGTCACTG (SEQ ID NO: 41) HuVaF-03CCATCTCATCCCTGCGTGTCTCCGACTCAGAGACGCACTCATAGGC MID-3AGACAGACTTGTCACTG (SEQ ID NO: 42) HuVaF-04CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCACTGTAGATAGGC MID-4AGACAGACTTGTCACTG (SEQ ID NO: 43) HuVaF-05CCATCTCATCCCTGCGTGTCTCCGACTCAGATCAGACACGATAGGC MID-5AGACAGACTTGTCACTG (SEQ ID NO: 44) HuVaF-06CCATCTCATCCCTGCGTGTCTCCGACTCAGATATCGCGAGATAGGC MID-6AGACAGACTTGTCACTG (SEQ ID NO: 45) HuVaF-07CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTGTCTCTAATAGGCA MID-7GACAGACTTGTCACTG (SEQ ID NO: 46) HuVaF-08CCATCTCATCCCTGCGTGTCTCCGACTCAGCTCGCGTGTCATAGGCA MID-8GACAGACTTGTCACTG (SEQ ID NO: 47) HuVaF-09CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTCTATGCGATAGGCA MID-10GACAGACTTGTCACTG (SEQ ID NO: 48) HuVaF-10CCATCTCATCCCTGCGTGTCTCCGACTCAGTGATACGTCTATAGGCA MID-11GACAGACTTGTCACTG (SEQ ID NO: 49) HuVbF-01CCATCTCATCCCTGCGTGTCTCCGACTCAGATACGACGTAACACCA MID-15GTGTGGCCTTTTGGGTG (SEQ ID NO: 50) HuVbF-02CCATCTCATCCCTGCGTGTCTCCGACTCAGTCACGTACTAACACCAG MID-16TGTGGCCTTTTGGGTG (SEQ ID NO: 51) HuVbF-03CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTCTAGTACACACCAG MID-17TGTGGCCTTTTGGGTG (SEQ ID NO: 52) HuVbF-04CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTACGTAGCACACCAG MID-18TGTGGCCTTTTGGGTG (SEQ ID NO: 53) HuVbF-05CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTACTACTCACACCAG MID-19TGTGGCCTTTTGGGTG (SEQ ID NO: 54) HuVbF-06:CCATCTCATCCCTGCGTGTCTCCGACTCAGACGACTACAGACACCA MID-20GTGTGGCCTTTTGGGTG (SEQ ID NO: 55) HuVbF-07CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTAGACTAGACACCA MID-21GTGTGGCCTTTTGGGTG (SEQ ID NO: 56) HuVbF-08CCATCTCATCCCTGCGTGTCTCCGACTCAGTACGAGTATGACACCA MID-22GTGTGGCCTTTTGGGTG (SEQ ID NO: 57) HuVbF-09CCATCTCATCCCTGCGTGTCTCCGACTCAGTACTCTCGTGACACCAG MID-23TGTGGCCTTTTGGGTG (SEQ ID NO: 58) HuVbF-10CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGAGACGAGACACCA MID-24GTGTGGCCTTTTGGGTG (SEQ ID NO: 59) B-P20EACCTATCCCCTGTGTGCCTTGGCAGTCTAATACGACTCCGAATTCCC — (SEQ ID NO: 60)

TABLE 1-2C 3^(rd) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10 10 mM Tris-HCl (pH 8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM B-P20EA 1500 nM primer 10 mM HuVaF or 1 500 nM HuVbF) primer 2^(nd) PCR amplicon1 Sterilized water 7

After PCR amplification by agarose gel electrophoresis shown in FIG. 11,a band comprising about 600 bp of amplicon was cut out, when visualized,and purified by using a DNA purification kit (QIAEX II Gel ExtractionKit, Qiagen). The amount of DNA from the collected PCR amplicon wasmeasured by using a Quant-T™ PicoGreen® dsDNA Assay Kit (Invitrogen).The collected amounts of DNA from each 10 healthy individual are shownin Table 1-7.

TABLE 1-7 Amount if amplified DNA collected from healthy individualsTCRα chain TCRβ chain Amount of DNA Amount of DNA Sample MID Tag (ng/uL)MID Tag (ng/uL)  #1 MID-1 2286 MID-15 857  #2 MID-2 2840 MID-16 526  #3MID-3 2970 MID-17 253  #4 MID-4 2982 MID-18 1194  #5 MID-5 3470 MID-19534  #6 MID-6 3512 MID-20 543  #7 MID-7 3471 MID-21 623  #8 MID-8 3201MID-22 756  #9 MID-10 2936 MID-23 744 #10 MID-11 2744 MID-24 798

(4. Next Generation Sequencing)

Next generation sequencing was carried out by Roche's GS Junior sequenceanalyzer. Specifically, a GS Junior Titanium emPCR Kit (Lib-L) was usedto carry out emPCR in accordance with the protocol of the manufacturerat the ratio of beads to DNA (copy per beads:cpb) of 0.5. After emPCR, asequence run was carried out for the beads collected with beadsenrichment by using sequence run reagents, GS Junior Titanium SequencingKit and PicoTiterPlate Kit, in accordance with the protocol of themanufacturer.

(5. Data Analysis)

The resulting sequence data (SFF file) was classified into readsequences for each MID Tag to create a sequence file in a Fasta formatby a software that comes with the GS Junior (sfffile or sffinfo). Theresulting mean number of reads was TRA: 17840 reads, TRB: 5122 reads,and the percentage of Raw data of 200 bp or greater was TRA: 34.9-63.7%(mean 42.2%) and TRB: 68.8-78.7% (mean 73.1%) (Table 1-8). The newlydeveloped repertoire analysis software (Repertoire Genesis, patentpending) was then used for collation with reference sequences in theIMGT (the international ImMunoGeneTics information system, www dot imgtdot org) database to assign a V region, D region and J region of eachread and determine the CDR3 sequence. The number of assigned reads isshown in Table 1-8. Further, the frequency of the same read was analyzedand usage frequency of V, D, and J chains was studied. FIGS. 13 (A-D),(A-D), 15 (A-D), and 16 show TRV and TRJ repertoires generated by usingreads obtained with Repertoire Genesis.

TABLE 1-8 TRA TRB Number Number 200 bp of 200 bp of Number or assignedNumber or assigned ID of reads greater Ratio reads ID of reads greaterRatio reads MID1  18429 6541 35.5% 5187 MID15 5760 4108 71.3% 3438 MID2 12954 8248 63.7% 6904 MID16 6067 4283 70.6% 3487 MID3  17866 7883 44.1%6328 MID17 5308 4080 76.9% 3420 MID4  19055 6649 34.9% 5201 MID18 54174004 73.9% 3501 MID5  17837 7974 44.7% 6374 MID19 3314 2279 68.8% 1914MID6  15872 7467 47.0% 5925 MID20 3365 2364 70.3% 2044 MID7  17208 758144.1% 5735 MID21 5722 4181 73.1% 3609 MID8  17184 7309 42.5% 5656 MID226148 4453 72.4% 3754 MID10 21422 8175 38.2% 6502 MID23 5462 4047 74.1%3453 MID11 20569 7519 36.6% 5989 MID24 4657 3665 78.7% 3252 Mean 178407535 42.2% 5980 Mean 5122 3746 73.1% 3187

FIG. 10 shows an amplification method of a TCR gene. Amplification wasperformed with a B-P20EA primer which is a P20EA adaptor primer addedwith a B-adaptor, and a primer which is a 3^(rd) nested primer addedwith an A-adaptor and an MID Tag sequence (MID-1 to 26).

FIG. 11 shows results of examining GS-PCR amplicons. Electrophoresis wasperformed on 10 μL of GS-PCR amplicons derived from 10 healthyindividuals with 2% agarose gel. The top row shows GS-PCR (TRA) (TCRαchain amplicon), and bottom row shows GS-PCR (TRB) (TCRβ chainamplicon).

FIG. 12 shows a parameter setting of a TCR/BCR repertoire analysissoftware (Repertoire genesis).

FIG. 13 shows a TRAV repertoire in health individuals. A TRAV repertoirefor 10 healthy individuals and the mean value thereof are shown. Thefrequency of presence of TRAV9-2, 12 and 13 are high. TRAV20 in #1 andTRAV21 in #5 are higher than other healthy individuals, exhibitingvariations among individuals.

FIG. 14 shows a TRBV repertoire in healthy individuals. A TRBVrepertoire for 10 healthy individuals and the mean value thereof areshown. The frequency of presence of TRBV20-1, 28 and 29-1 are high.TRBV3-1 in #8 was higher than other healthy individuals, exhibitingvariations among individuals.

FIG. 15 shows a TRAJ repertoire in healthy individuals. A TRAJrepertoire for 10 healthy individuals and the mean value thereof areshown. A TRAJ repertoire of healthy individuals showed about 5% or lessin any AJ family. TRAJ12 in #1, TRAJ27 in #4, TRAJ37 in #5, and TRAJ45in #8 were higher than other healthy individuals, exhibiting variationsamong individuals.

FIG. 16 shows a TRBJ repertoire in healthy individuals. A TRBJrepertoire for 10 healthy individuals and the mean value thereof areshown. The frequency of presence of TRBJ2-1, 2-3, and 2-7 was high andTRBJ2-2 was high in #8 in TRBJ repertoires of healthy individuals,exhibiting variations among individuals.

Thus, it was proven that unbiased quantitative analysis is also possiblein TCRs by using a sample prepared by the preparation method of thepresent invention.

Preparation Example 3: Amplification of TCR and BCR Genes by UnbiasedAdaptor-Ligation PCR

In the present Example, TCR and BCR genes are amplified by unbiasedadaptor-ligation PCR>

(Materials and Methods)

(Sample)

Peripheral blood mononuclear cells of a healthy individual

(Method)

(1. RNA Extraction)

5 mL of whole blood was collected from one healthy individual in aheparin-containing blood collection tube. Peripheral blood mononuclearcells (PBMC) were separated by ficoll density gradient centrifugation.Total RNA was extracted/purified from the isolated 5×10⁶ PBMCs by usingan RNeasy Lipid Tissue Mini Kit (QIAGEN, Germany).

(2. Synthesis of Complementary DNA and Double Stranded ComplementaryDNA)

The extracted RNA sample was used to carry out an adaptor-ligation PCR.First, in order to synthesize a complementary DNA, a BSL-18E primer(Table 1-1) and 3.5 μL (812 ng) of RNA were admixed and annealed for 8minutes at 70° C. After cooling on ice, a reverse transcription reactionwas performed in the presence of an RNase inhibitor (RNAsin) tosynthesize a complementary DNA with the following composition.

TABLE 1-3A Synthesis of complementary DNA Regent Content (μL) Finalconcentration RNA solutin 3.5 200 μM BSL-18E 1.5 30 μM Total 5 70° C., 8minutes 5× First strand buffer 2 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mMMgCl₂ 0.1 M DTT 1 10 mM 10 mM dNTPs 0.5 500 μM RNasin (Promega) 0.5 2U/μL Superscript III ™, 200 U/μL 1 20 U/μL (Invitrogen)

The complementary DNA was subsequently incubated for hours at 16° C. inthe following double-stranded DNA synthesis buffer in the presence of E.coli DNA polymerase I, E. coli DNA Ligase, and RNase H to synthesize adouble stranded complementary DNA. Furthermore, T4 DNA polymerase wasreacted for 5 minutes at 16° C. to perform a 5′ terminal bluntingreaction.

TABLE 1-3B Synthesis of complementary double stranded DNA Regent Content(μL) Final concentration Complementary DNA 9 reaction solutionSterilized water 46.5 5× Second strand buffer 15 25 mM Tris-HCl, pH 7.5,100 mM KCl, 5 mM MgCl₂, 10 mM (NH₄)SO₄, 0.15 mM β-NAD+, 1.2 mM DTT 10 mMdNTPs 1.5 0.2 mM E. coli DNA ligase, 0.5 0.067 U/μL 10 U/μL (Invitrogen)E. coli DNA polymerase, 2 0.27 U/μL 10 U/μL (Invitrogen) RNaseH, 2 U/μL0.5 0.013 U/μL (Invitrogen) Total 75 μL 16° C., 2 hours T4 DNApolymerase, 1 0.067 U/μL 5 U/μL (Invitrogen) 16° C., 5 minutes

A double stranded DNA, after column purification by a High Pure PCRCleanup Micro Kit (Roche), was incubated all night at 16° C. in thepresence of a P20EA/10EA adaptor (Table 1-1) and T4 ligase in thefollowing T4 ligase buffer for a ligation reaction.

TABLE 1-3C Adaptor adding reaction Regent Content (μL) Finalconcentration Complementary double 12.5 stranded DNA solution T4 ligasebuffer 5 50 mM Tris-HCl, pH7.6, 10 mM MgCl₂, 1 mM ATP, 5% PEG5000, 1 mMDTT 50 μM P20EA/10EA adaptor 5 10 μM T4 DNA ligase, 1 U/μL 2.5 0.1 U/μL(Invitrogen) Total 25 16° C., all night

An adaptor added double stranded DNA purified by a column as discussedabove was digested by a NotI restriction enzyme (50 U/μL, Takara) withthe following composition in order to remove an adaptor added to the 3′terminal.

TABLE 1-3D Restriction enzyme treatment Regent Content (μL) Finalconcentration complementary double 34 stranded DNA solution 10×restriction enzyme 5 50 mM Tris-HCl, pH7.5, buffer 10 mM MgCl₂, 1 mM, 1mM DTT, 100 mM NaCl 0.1% BSA 5 0.01% 0.1% Triton X-100 5 0.01% NotI, 50U/μL (Takara) 1 1 U/μL Total 50 37° C., 2 hours

The 1^(st) PCR from a double stranded complementary DNA was performed byusing a common adaptor primer P20EA and TCR C region specific primer(CA1, CB1, CG1, CD1), or immunoglobulin isotype C region specific primer(CM1, CA1, CG1, CD1, CE1, CK1, CL1). A primer was set at the 3′ terminalside, middle portion or 5′ side of a C region such that a sequencecomprising the full length of the C region can be amplified. 20 cycleswere performed, where a cycle was 30 seconds at 95° C., 30 seconds at55° C., and one minute at 72° C. with the following reactioncomposition. The primer sequences used are shown in Table 1-1.

TABLE 1-3E 1st PCR amplification reaction composition Regent Content(μL) Final concentration 2× ExTaq Premix (Takara) 10  10 mM Tris-HCl(pH8.3)  50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10mM P20EA primer 0.5 250 nM 10 mM specific Primer 0.5 250 nM TRAC (2types), TRBC (3 types), TRGC (2 types), TRDC (2 types), IGHM (3 types),IGHG (3 types), IGHA (3 types), IGHD (3 types), IGHE (3 types), IGLK (1type), IGLL (1 type) Double stranded 2 complementary DNA Sterilizedwater 7

The 1^(st) PCR amplicon, which is a product of the first PCRamplification reaction, was then used to perform nested PCR with thereaction composition shown below between a P20EA primer and each of theimmunoglobulin isotype C region specific primers. 20 cycles of PCR wereperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C. The primer sequences used are shown in Table1-1.

TABLE 1-3F 2^(nd) PCR amplification reaction composition Content Regent(μL) Final concentration 2× ExTaq Premix Takara 10  10 mM Tris-HCl(pH8.3)  50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10mM P20EA primer 1 500 nM 10 mM specific Primer: 1 500 nM TRAC (2 types),TRBC (3 types), TRGC (2 types), TRDC (2 types), IGHM (3 types), IGHG (3types), IGHA (3 types), IGHD (3 types), IGHE (3 types), IGLK (1 type),IGLL (1 type) 1^(st) PCR amplicon 2 Sterilized water 6

Amplicons with a size of interest, when visualized, were observed as aresult of electrophoresis of each 2^(nd) PCR amplicon synthesized by thesecond PCR amplification reaction with 2% agarose gel (FIG. 17).

TABLE 1-9 Primer sequences Primer Target Step Sequence BSL-18E All cDNAAAAGCGGCCGCATGCTTTTTTTTTTTTTTTTTT primer VN (SEQ ID NO: 1116) P20EA AllAdaptor TAATACGACTCCGAATTCCC (SEQ ID NO: 1117) P10EA All AdaptorGGGAATTCGG (SEQ ID NO: 1118) TCR primers CA1 TRAC 1stTGTTGAAGGCGTTTGCACATGCA (SEQ ID NO: 1119) CA2 TRAC 2ndGTGCATAGACCTCATGTCTAGCA (SEQ ID NO: 1120) TRAC-3ter-1st TRAC 1stCAGCGTCATGAGCAGATT (SEQ ID NO: 1121) TRAC-3ter-2nd TRAC 2ndACTTTCAGGAGGAGGATT (SEQ ID NO: 1122) CB1 TRBC 1stGAACTGGACTTGACAGCGGAACT (SEQ ID NO: 1123) CB2 TRBC 2ndAGGCAGTATCTGGAGTCATTGAG (SEQ ID NO :1124) TRBC-Center-1st TRBC 1stACAGTCTGCTCTACCCCA (SEQ ID NO: 1125) TRBC-Center-2nd TRB 2ndGTCCACTCGTCATTCTCC (SEQ ID NO: 1126) TRBC-3ter-1st TRBC 1stGAATCCTTTCTCTTGACC (SEQ ID NO: 1127) TRBC-3ter-2nd TRBC 2ndTTCCCTAGCAAGATCTCATA (SEQ ID NO: 1128) BCR primers CM1 IGHM 1stTCCTGTGCGAGGCAGCCAA (SEQ ID NO: 1129) CM2 IGHM 2ndGTATCCGACGGGGAATTCTC (SEQ ID NO: 1130) IGHM-Cent-1st IGHM 1stGATGTTGGTGTGGGTTTTCA (SEQ ID NO: 1131) IGHM-Cent-2nd IGHM 2ndATAGGTGGTCAGGTCTGTGA (SEQ ID NO: 1132) IGHM-3ter-1st IGHM 1stTAGAAGAGGCTCAGGAGGAA (SEQ ID NO: 1133) IGHM-3ter-2nd IGHM 2ndTTCCATTCCTCTTCGGACAC (SEQ ID NO: 1134) CA1 IGHA 1stGCTGGCTGCTCGTGGTGTAC (SEQ ID NO: 1135) CA2 IGHA 2ndGGGAAGTTTCTGGCGGTCACG (SEQ ID NO: 1136) IGHA-Cent-1st IGHA 1stGAATGTGTTTCCGGATTTTG (SEQ ID NO: 1137) IGHA-Cent-2nd IGHA 2ndTAGCAGCCACAGAGGTCA (SEQ ID NO: 1138) IGHA-3ter-1st IGHA 1stGCCATGACAACAGACACATT (SEQ ID NO: 1139) IGHA-3ter-2nd IGHA 2ndGGTCGATGGTCTTCTGTGT (SEQ ID NO: 1140) CG1 IGHG 1stCACCTTGGTGTTGCTGGGCTT (SEQ ID NO: 1141) CG2 IGHG 2ndTCCTGAGGACTGTAGGACAGC (SEQ ID NO: 1142)

FIGS. 18-25 show a primer position with respect to a template. Thefigures show that a significant range of regions is suitable as a PCRprimer of interest of the present invention. It is also understood thata specific sequence can be appropriately determined based on theprinciples of the present invention.

(Preparation Example 4: Detection of tumor cells using human acutelymphoblastic leukemia cell line) Tumor cells were detected using ahuman acute lymphoblastic leukemia cell line in the present Example.

(Materials and Methods)

(Sample)

Peripheral blood mononuclear cells of a healthy individual, MOLT-4 humanacute lymphoblastic leukemia cell line

(Method)

(1. Culture of T Cell Based Leukemia Cell Line)

A human acute lymphoblastic leukemia cell line Molt-4 was used as a Tcell based cell line expressing a T cell receptor (TCR). Cells werecultured in an RPMI-1640 medium comprising 10% fetal bovine serum, 100IU/ml penicillin, 100 μg/ml streptomycin, and 2 mM L-glutamine under thecondition of 5% CO₂ at 37° C. A total of 1×10⁷ cells were collected. Thecells were washed and suspended in the RPMI-1640 medium such that 1×10⁶cells/mL.

(2. Separation of Peripheral Blood Mononuclear Cells of HealthyIndividual)

5 mL of whole blood was collected from one healthy individual in aheparin-containing blood collection tube. Peripheral blood mononuclearcells (PBMC) were separated by ficoll density gradient centrifugation.The cells were washed, counted, and suspended in the RPMI 1640 mediumsuch that 1×10⁶ cells/mL.

(3. Preparation of Serial Diluent of Cells)

The resulting 1×10⁶ cells/mL PBMCs and 1×10⁶ cells/mL Molt-4 cells weremixed so as to have the following number of cells to prepare a MOlt-4serially diluted cell suspension.

TABLE 1-4A PBMC Molt-4  100% 0 1 × 10⁶   10%    9 × 10⁵ 1 × 10⁵   1% 9.9 × 10⁵ 1 × 10⁴  0.1%  9.99 × 10⁵ 1 × 10³ 0.01% 9.999 ×10⁵ 1 × 10²

(4. RNA Extraction and Measurement of Amount of RNA)

Total RNA was extracted/purified from the serially diluted cellsuspension by using an RNeasy Lipid Tissue Mini Kit (QIAGEN, Germany).The RNA was eluted in 20 μL eluate. The amount of RNA was quantified byabsorbance of A260 by using an Agilent 2100 bioanalyzer (Agilent). FIG.26 shows an image of RNA electrophoresis. The amount of RNA obtainedfrom each sample is shown in Table 1-4B.

TABLE 1-4B Concentration Total amount of Sample (ng/μL) Ratio(A260/A280)RNA (μg)  100% 122 2.0 1.22   10% 130 1.9 1.3    1% 82 1.7 0.82   0.1%62 0.8 0.62 0.01% 30 0.8 0.3

(5. Synthesis of Complementary DNA and Double Stranded ComplementaryDNA)

The extracted RNA sample was used to carry out adaptor-ligation PCR.First, in order to synthesize a complementary DNA, a BSL-18E primer and3.5 μL of RNA were admixed and annealed for 8 minutes at 70° C. Aftercooling on ice, a reverse transcription reaction was performed in thepresence of an RNase inhibitor (RNAsin) to synthesize a complementaryDNA with the following composition.

TABLE 1-4C Synthesis of complementary DNA Regent Content (μL) Finalconcentration RNA solution 3.5 200 μM BSL-18E 1.5  30 μM Total 5 70° C.,8 minutes 5× First strand buffer 2 50 mM Tris-HCl, pH8.3, 75 mM KCl, 3mM MgCl₂ 0.1M DTT 1 10 mM 10 mM dNTPs 0.5 500 μM RNasin (Promega) 0.5  2U/μL Superscript III ™, 200 1 20 U/μL U/μL (Invitrogen)

The complementary DNA was subsequently incubated for 2 hours at 16° C.in the following double-stranded DNA synthesis buffer in the presence ofE. coli DNA polymerase I, E. coli DNA Ligase, and RNase H to synthesizea double stranded complementary DNA. Furthermore, T4 DNA polymerase wasreacted for 5 minutes at 16° C. to perform a 5′ terminal bluntingreaction.

TABLE 1-4D Synthesis of complementary double stranded DNA Regent Content(μL) Final concentration Complementary DNA 9 reaction solutionSterilized water 46.5 5× Second strand buffer 15 25 mM Tris-HCl, pH7.5,100 mM KCl, 5 mM MgCl₂, 10 mM (NH₄)SO₄, 0.15 mM β-NAD+, 1.2 mM DTT 10 mMdNTPs 1.5 0.2 mM E. coli DNA ligase, 10 0.5 0.067 U/μL U/μL (Invitrogen)E. coli DNA polymerase, 2  0.27 U/μL 10 U/μL (Invitrogen) RNaseH, 2 U/μL0.5 0.013 U/μL (Invitrogen) Total 75 μL 16° C., 2 hours T4 DNApolymerase, 5 1 0.067 U/μL U/μL (Invitrogen) 16° C., 5 minutes

A double stranded DNA, after column purification by a High Pure PCRCleanup Micro Kit (Roche), was incubated overnight at 16° C. in thepresence of a P20EA/10EA adaptor (Table 1-4E) and T4 ligase in thefollowing T4 ligase buffer for a ligation reaction.

TABLE 1-4E Adaptor adding reaction Regent Content (μL) Finalconcentration complementary double 12.5 stranded DNA solution T4 ligasebuffer 5 50 mM Tris-HCl, pH7.6, 10 mM MgCl₂, 1 mM ATP, 5% PEG5000, 1 mMDTT 50 μM P20EA/10EA 5 10 μM adaptor T4 DNA ligase, 1 U/μL 2.5 0.1 U/μL(Invitrogen) Total 25 16° C., all night

An adaptor added double stranded DNA purified by a column as discussedabove was digested by a NotI restriction enzyme (50 U/μL, Takara) withthe following composition in order to remove an adaptor added to the 3′terminal.

TABLE 1-4F Restriction enzyme treatment Regent Content (μL) Finalconcentration complementary double 34 stranded DNA solution 10×restriction enzyme 5 50 mM Tris-HCl, pH7.5, 10 mM buffer MgCl₂, 1 mM, 1mM DTT, 100 mM NaCl 0.1% BSA 5 0.01% 0.1% Triton X-100 5 0.01% NotI, 50U/μL (Takara) 1 1 U/μL Total 50 37° C., 2 hours

(6. PCR)

The 1^(st) PCR amplification was performed by using a common adaptorprimer P20EA shown in Table 1-1 and a TCRα chain or β chain C regionspecific primer (CB1) from a double stranded complementary DNA. 20cycles of PCR were performed, where a cycle was 30 seconds at 95° C., 30seconds at 55° C., and one minute at 72° C. with the composition shownbelow.

TABLE 1-4G 1^(st) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10  10 mM Tris-HCl (pH8.3) (Takara) 50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EAprimer 0.5 250 nM 10 mM CB1 primer 0.5 250 nM Double stranded 2complementary DNA Sterilized water 7

A 1^(st) PCR amplicon was then used to perform nested PCR with thereaction composition shown below by using a P20EA primer and eachimmunoglobulin isotype C region specific primer. 20 cycles of PCR wereperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C.

TABLE 1-4H 2^(nd) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10  10 mM Tris-HCl (pH8.3) (Takara) 50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EAprimer 1 500 nM 10 mM CB2 primer 1 500 nM 1^(st) PCR amplicon 2Sterilized water 6

A primer was removed with a High Pure PCR Cleanup Micro Kit (Roche) fromthe obtained 2^(nd) PCR amplicon. Furthermore, analysis was carried outwith Roche's next generation sequence analyzer (GS Junior Bench Topsystem), with the 2^(nd) PCR amplicon diluted 10 fold as a template.Amplification utilized a B-P20EA primer, which is a P20EA adaptor primeradded with an adaptor B sequence, and HuVbF primer, which is a TCR chainC region specific prime added with an adaptor A sequence and each MIDTag sequence. 10 cycles of PCR were performed, where a cycle was 30seconds at 95° C., 30 seconds at 55° C., and one minute at 72° C.

TABLE 1-4I 3^(rd) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10  10 mM Tris-HCl (pH8.3) (Takara) 50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mMB-P20EA 1 500 nM primer 10 mM HuVbF 1 500 nM primer 2^(nd) PCR amplicon1 Sterilized water 7

(7. Next Generation Sequencing)

Next generation sequencing was carried out by Roche's GS Junior sequenceanalyzer. Specifically, a GS Junior Titanium emPCR Kit (Lib-L) was usedto carry out emPCR in accordance with the protocol of the manufacturerat the ratio of beads to DNA (copy per beads:cpb) of 2. After emPCR, asequence run was carried out for the beads collected with beadsenrichment by using sequence run reagents, GS Junior Titanium SequencingKit and PicoTiterPlate Kit, in accordance with the protocol of themanufacturer.

(8. Data Analysis)

The resulting sequence data (SFF file) was classified into readsequences for each MID Tag to create a sequence file in a Fasta formatby a software that comes with GS Junior (sfffile or sffinfo). Theresulting number of effective reads was 11651. A repertoire analysissoftware (Repertoire Genesis) was used for collation with referencesequences in the IMGT database to assign a BV region and BJ region ofeach read and to determine the CDR3 sequence. An in-frame TCR read(Read 1) having a functional sequence and a TCR read (Read 2) causing aframe shift were observed from the Molt-4 cells (Table 1-4J). Each wasdetected at about the same frequency and estimated to be a TCR genederived from a Molt-4 cell. Gene rearrangement in two TCR loci has beenalready reported in Molt-4 cells (Cited Reference 1: Tunnacliffe A,Kefford R, Milstein C, Forster A, Rabbitts T H. Sequence and evolutionof the human T-cell antigen receptor beta-chain genes. Proc Natl AcadSci USA. 1985 August; 82(15): 5068-72.) The sequence of a functional TCRgene (Read 1) matched the already reported sequence (Cited Reference 2:Assaf C, Hummel M, Dippel E, Goerdt S, Muller H H, Anagnostopoulos I,Orfanos C E, Stein H. High detection rate of T-cell receptor beta chainrearrangements in T-cell lymphoproliferations by family specificpolymerase chain reaction in combination with the GeneScan technique andDNA sequencing. Blood. 2000 Jul. 15; 96(2):640-6., GenBank Accessionnumber: M12886.1).

TABLE 1-4J CDR3 amino Read Frame BV BJ acid sequence 1 In-frame TRBV20-1TRBJ2-1 CSARESTSDPKNEQFFG (SEQ ID NO: 1163) 2 Out-of- TRBV10-3 TRBJ2-5CAISEPTGIRRDPVLR frame (SEQ ID NO: 1164)

In order to find out the detection limit of Molt-4 cells by the nextgeneration TCR repertoire analysis method, two TCR reads derived fromMolt-4 cells were searched and collated in TCR reads acquired fromserially diluted samples (FIGS. 27 (A-D)). As a result, Read 1 and Read2 were detected in accordance with the number of cells in the seriallydiluted samples. It was confirmed that 61 Reads (3.1%) were present inthe 0.1 sample for Read 1 and 1 Read (0.01%) was present in the 0.01%sample for Read 2 (Table 1-4K). The functional TCR, Read 1, was notdetected in the 0.01% sample, while Read 2 predicted to lackfunctionality was detected. This suggests that certainty of tumor celldetection is elevated by searching a plurality of TCR genes derived fromone T cell. The results show that the present method can detect tumorcells at high sensitivity.

TABLE 1-4K Detection sensitivity Sample Read 1 Read 2  100% + +  10% + +   1% + +  0.1% + − 0.01% − + +: detected −: not detected

(Results)

FIG. 26 shows an image of RNA electrophoresis by an Agilent 2100bioanalyzer. Total RNA was extracted from a serially diluted cellsolution and the amount of RNA was measured with an Agilent bioanalyzer.An RNA was separated with a microchip electrophoretic apparatus to checkthe quality of the RNA. 28S (top band) and 18S rRNA (bottom band) weredetected in each sample, demonstrating that an RNA which has not beendegraded was obtained.

FIGS. 27 (A-D) show TCR reads in serially diluted Molt-4 cell samples(SEQ ID NOs: 1165-1324). TCR reads acquired from each of 10%, 1%, 0.1%,and 0.01% serially diluted Molt-4 sample are described. The reads wereranked in the order of having a greater number of reads and the top 40positions are shown. Ranking 365 to 404 are shown for the 0.01% sample.TRBV, TRBJ, and CDR3 amino acid sequences for each read, and number ofreads are shown. Functional TCR reads(TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG (SEQ ID NO: 1166)) derived fromMolt-4 are shown in bold with a gray background. The other TCR readsestimated to have a functional deficiency(TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR (SEQ ID NO: 1165)) are shown in bold.

FIG. 28 shows detection sensitivity and number of TCR reads in aserially diluted Molt-4 cell sample. Two TCR reads were detected from aMolt-4 cell (▴: TRBV20-1/TRBJ2-1/CSARESTSDPKNEQFFG (SEQ ID NO: 1166), ∘:TRBV10-3/TRBJ2-5/CAISEPTGIRRDPVLR (SEQ ID NO: 1165)). The figure showsthe percentage of TCR reads derived from Molt-4 detected in TCR readsacquired from each of 10%, 1%, 0.1% and 0.01% serially diluted Molt-4samples. The detection limit for each read was 0.1% (▴) and 0.01% (∘).

ANALYTICAL TEST EXAMPLE Analytical Test Example 1 BCR RepertoireAnalysis on Healthy Individuals

The present Example compared BCR repertoires of healthy individuals.

(Materials and Methods)

(Materials)

A read set was used, which was from sequencing, with a Roche GS-Junior,a cDNA of a BCR unbiasedly obtained from an RNA obtained from onespecimen of healthy individual peripheral blood mononuclear cell. Theread set is for each class of IgM, IgG, IgA, IgD, and IgE.

(Method)

FIG. 30 shows the overall picture of the method (FIG. 29 shows a TCRanalysis scheme).

Previously reported allelic nucleic acid sequences from IMGT wereobtained for use as a reference data base. BLASTN was used for homologysearch while setting the following parameters for each region.

V mismatch penalty=−1, shortest alignment length=30, and shortest kernellength=15;

D word length=7, mismatch penalty=−1, gap penalty=0, shortest alignmentlength=11, and shortest kernel length=8;

J mismatch penalty=−1, shortest hit length=18, and shortest kernellength=10; and

C shortest hit length=30 and shortest kernel length=15.

An indicator used for selecting the closest reference allele was appliedin the following priority order.

1. number of matching bases, 2. kernel length, 3. score, and 4.alignment length, and then for each class, the frequency of appearanceof gene name for each region was calculated and compared with oneanother. Further, IgG and IgA have subclasses. Thus, comparison was alsoperformed between subclasses.

(Results)

FIG. 31 shows results of deriving out a frequency of appearance of Cgene name for each read set of IgM, IgG, IgA, IgD, and IgE. Only thegene name corresponding to each class appeared and hardly any no-hit isobserved, suggesting sufficient quality of the read set subjected toanalysis.

The results of calculating a D repertoire for each class are shown inTables 2-3 and 2-4. Table 2-3 and Table 2-4 show a comparison of Drepertoires among classes. The number of reads appeared is described foreach gene name and CDR3 amino acid sequence. The gene name and aminoacid sequence with number of reads of 1 were omitted. Further, FIGS. 32(A and B) show a V repertoire and FIG. 33 shows a J repertoire. FIGS. 34(A and B) show a comparison of V repertoires among subclasses. FIG. 35shows a comparison of J repertoires among subclasses. For D, a frequencyis derived out for a combination of a D gene name and CDR3 amino acidsequence.

(Table 2-3) Comparison of D repertoire among classes, vertical axis:frequency (%), horizontal axis: gene name

TABLE 2-3 num num SEQ ID gene name reads CDR3 amino acids reads NO: IgAIGHD2-21 74 CAKDMCGLWASCGGDCYSRRTTSLTT 41 61 CAKDMCGLWASCGGDCYSRRTASLTT5 62 CARGPNMAFVVVTAILMLLIS 4 63 CARAPDCGGSTCYSHPYYGMDVW 4 64CARSHIVVVTAIPLEMLLIS 2 65 CARDPRIVVVAPATHTPTTVWTS 2 66 IGHD6-13 41CGRSRHSSSWQILTP 11 67 CANGGLAAAGDHLTT 5 68 CALCPTPIAAAGSVTT 5 69CARAPSIPVAGIGYHFDHW 3 70 CALCPNPYSSGWFCNYW 3 71 CARAPSIPVAGIATTLTT 2 72IGHD3-3 37 CIYDFWSGGPHPTLTT 11 73 CARIVNTEGFWSGFLTP 4 74 CTRRGGVVIICLTT2 75 CIHTGNDFWTGTNYGLTS 2 76 CARIVNTEGFGVVFLTP 2 77CAKDRFSGRGRFEFMEWLTPLTT 2 78 CAKDRFSEGKVQFMEWLTPLTT 2 79 IGHD3-22 36CARRPIPPLTMRVVVIPLTS 5 80 CARDPPMPMIVVQTLTT 2 81 CARDPPMPMILVQTLTT 2 82CAKILITMILVVSLMLLIS 2 83 IGHD3-10 33 CAREIRGTTMVRELTTSTATWTP 6 84CVRTYYFGLGDIITEITSTVWTS 3 85 CATYYYGSGSAGHNFDYW 2 86 CARRTYYYGSTNLTT 287 CARMVTDYYGSGNRGWFDPW 2 88 CARGPGLSVMIRGVITTPNHILIT 2 89 CARDYYGSGVMTL2 90 IGHD2-15 29 CARAPDCGGGTCYSHPTTVWTS 5 91 CARARIVVVVPATLTPTTVWTS 3 92CARAPDCGGGTCYSHPYYGMDVW 3 93 CAKDLAPLKSCSRGGCYPYYYGLDIW 3 94 IGHD1-26 22CARGPATAILGATPSLTP 3 95 CARDDSASYSRGTT 3 96 CVRHDYSDNDLSTNWFGP 2 97IGHD5-5 19 CMGPGDTAI 7 98 CARRPREMESAMVLSLTT 2 99 IGHD3-9 19CAHSAPYYDILSRNRARSWKDFDNW 3 100 CAHSAPIMIFCLVTAHEVGRILTT 3 101CATVALLRYFDWSSTR 2 102 IGHD6-19 14 CARGRRSLPAIVYSSGPDRPNWFDPW 6 103CTSAAVASSSGWPLRGVWTS 3 104 IGHD2-2 14 CARAPLCSSASCHLQLDYW 5 105 IGHD4-2311 CARGAGYGGNSGVRTT 9 106 IGHD4-17 10 CARTLYGDFVDF 2 107 IGHD3-16 9CAKGVLSSGGVIATLPGSTP 3 108 CARGFGARGVILT 2 109 IGHD2-8 8CANVGGADRNYCINGVRHNPNYLTT 5 110 IGHD6-6 5 IGHD5-12 5 CARHVNGYDYLFPFTSW 3111 IGHD1-1 3 CARGGSQLERRRPLVTT 3 112 IGHD5-24 2 (unidentified) 662CARETVGGTLTT 19 113 CARTSSGHDPPIITGWTS 13 114 CAKGHQVRLRGRTGTSIS 11 115CARSPIWFGSHRFTTTWRS 9 116 CARDPLETGATSLII 9 117 CAKLGNRPGFTEWDHWFGPW 9118 CAGAPDCGVGAAPLTSTTVCTS 8 119 CVRDPHETGATTLIT 6 120 CARIRKEVGAPPITWTS6 121 CARGSWSGAAFYSLTT 6 122 CARDPNKFRTNHLSTT 6 123 CAGIGGATSTTTTTTWTS 6124 CATVPELTDISLPRLMALIS 5 125 CARVWGKHTLTT 5 126 CARRAAPHDYGHVLIF 5 127CARDPNKFRPNHLSTT 5 128 CARAGRELLRALMTT 5 129 CARAGAELLRALMTT 5 130CARAEDYYDTEGYFYLTP 5 131 CAHRTNYSTNRYGAFTTLTS 5 132 CVRHDGSFTKTGSTP 4133 CVRDPQETGATTLIT 4 134 CVKIGAAH 4 135 CATQCLGGAGLTTTTAPWTS 4 136CARRTYYSGSTNLTT 4 137 CARRTTRETGSSIS 4 138 CARLRCSNDNCAGHLYYYFSGLDIW 4139 CARASLPRGLLIS 4 140 CAKGRGRRAAGKFLTT 4 141 CVRQYGLGSGSLTP 3 142CVKIRNLIGFTGSTP 3 143 CTRDGVRGDLNPTLNV 3 144 CTKGGGRKTAGKFLTP 3 145CDKAKVTADLRT 3 146 CATVPELPDISLPRLMALIS 3 147 CATVFGRRYRLLTT 3 148CARYRAAYPRRAWTS 3 149 CARTIGFEIAMTGGLGALTP 3 150 CARTARTGDL 3 151CARRDPPVRASLSTTLTS 3 152 CARIGHEFYSLTYSVNDVFDLW 3 153CARFQRYCRGGSCSATLDAFDKW 3 154 CARDLGERRDGEVINWFDAW 3 155 CARDLAVWATLTT 3156 CAKGAGRRAAGKFLTT 3 157 CAKDVEPTVTLYNHFDP 3 158 CAKDFNWEGIT 3 159CAHRTNYSTNRYGGLYYFDFW 3 160 CVRGVGTILWLTI 2 161 CVRFIGAYSNNWYPGYFDYW 2162 CVRDAGPGGSLTS 2 163 CTTGFSGSTACHWDHTACHWDDAFAMW 2 164CTHAVESLLGTTSTS 2 165 CIHTGNDFGPGPTMVWTS 2 166 CGVGRGDNDVDFKFKW 2 167CATRESPLTT 2 168 CATAGIELWRAGSTP 2 169 CASQSQNYYYYYMDVW 2 170CASKKEILWAGPNLTT 2 171 CARYRIAMATSPYFDYW 2 172 CARVRCGLVASEGVLIS 2 173CARTNFGSGGYILGDTTMVWTS 2 174 CARSAGYLHRRTS 2 175 CARRTYYSGSTNFDYW 2 176CARRDLPFGASLSTTLTS 2 177 CARRAAPMTTGMFLIF 2 178 CARPGFSYGPRLTP 2 179CARLRGGFPPVVKRVEVFLLTS 2 180 CARKKIPTAGYSSLTT 2 181 CARGSWMGRPFISLTT 2182 CARGRFARGGDDSLIS 2 183 CARGLRWADN 2 184 CARGGTSGLILDTTSTPWTS 2 185CAREMHIDSLTVGRAFDIW 2 186 CARDVPDIYSSGATDC 2 187 CARDPSYLPTPALKT 2 188CARDPNKFRPNHFVDYW 2 189 CARDLGTTNYWLDTW 2 190 CAKQRASGNSLTI 2 191CAKEPKIVGRRRTTLIT 2 192 CAKDLGVCSEGAASSLVLIS 2 193 CAKAPGDLCRSTP 2 194CAHSAPYYDICLVTAHEVGRILTT 2 195 CAGLIGRFIPLTT 2 196 CAGIRGSNIYYHYYYMDVW 2197 CADLPGIIGGEIT 2 198 IgD IGHD3-22 432 CARHDTPRVYYDSSGYYYGVDYFDYW 168199 CASMDTKNYYDSSGSQPRRSYYFDYW 39 200 CAQYYYDSSGYYYYYGMDVW 25 201CARISYYYDSSGYYYRDW 21 202 CARVRGITMIVVVTTLTT 17 203CASMDTKNYYDSSGSQPGGRTTLTT 7 204 CASMDTKITMIVVVPNPGGRTTLTT 7 205CARYNYTIVVGP 5 206 CARISYYYDSSGYYTVT 5 207 CARIRYYYDSSGYYYFDYW 4 208CARHVRDGMIVVAEIDYW 4 209 CARVAVRSYYPFGMDVW 3 210 CARLPLDSSGYYLTT 3 211CARLPLDSSGYYFDYW 3 212 CASMDTKNYYDSSGSQPRRSHYFDYW 2 213CARYRITMIVVVITTVT 2 214 CARYNYYDSSGSW 2 215 CARVRGYYDSMSMSALLMS 2 216CARVRGTMIVVSMSALLMS 2 217 CARVRGNYYDSSGYYFDYW 2 218 CARSGRVGARPKLYYW 2219 CARTSYYYDVVVITTVT 2 220 CARHVRDGMIVVAEMTT 2 221CARHDTPRAYYDSSGYYYGVDYFDYW 2 222 CAREFFGTRTMIVVVTYFDYW 2 223CAQYYYDSMVITTTTVWTS 2 224 IGHD3-10 217 CARGVRGVIINTFTTLTT 118 225CTWFGEATTTVWTS 25 226 CARGGSGVIINTFTTLTT 10 227 CARCAGGSGSYYYYYMEVW 9228 CVKAGFGELLIGGDRTT 6 229 IGHD3-16 169 CARGGSGSYYKHVYYFDYW 6 230CARCAGGSGVTTTTTWRS 6 231 CTWFGGGYYYGMDVW 5 232 CTWFGGATTTVWTS 3 233CARLDGSGRRGTALTT 3 234 CARLDVRGGRGTALTT 2 235 CARGGSGVIINTFTTLTM 2 236CARCAGVRGVTTTTTWRS 2 237 CARRVMITFGELSSTTLTT 140 238 CARRVMITFGGVIVTTLTM3 239 CARRVMITFGGVIVDYFDYW 3 240 CARRVMITFGGVIVDTLTT 3 241CANPTSFRQCSMIT 3 242 CARRVMITLGELSSTTLTT 2 243 CARRAMITFGELSSTTLTT 2 244IGHD6-19 134 CARHGIAVAYYFDYW 28 245 CARVSSGWSGGNPAPATLTT 22 246CARHVGSGWVYFDYW 12 247 CARRDDSSGWYGHDYW 11 248 CARGYSSGFGDALIP 8 249CARVSSGWSGVTPAPATLTT 7 250 CARRDDSMAGTAMTT 4 251 CARGYSSGFGDAFDTW 4 252CARGIRYSSGWYGSNWFDPW 4 253 CARRDDSSGWYGHDY 3 254 CARRDDSSAGTAMTT 3 255CARHGIAWPTTLTT 3 256 CARHVGSGWVTLTT 2 257 CARHGIAVAYYLTT 2 258 IGHD5-5122 CARAGGYSYGYLLPLMLLIS 29 259 CARRKRELLWVTTTTTTWTS 20 260 CARQKSATVWTS17 261 CARVNLEQLWYRRGTTTTVWTS 8 262 CARVNLEQLCTGRGTTTTVWTS 7 263CARFYNRRMLSTAMVDIDYW 5 264 CARVNLEQLWYRTGYYYYGMDVW 4 265CARLFNYAREYGMDVW 4 266 CARVNLEQLWYRTGSTTTVWTS 2 267 CARVAPRLTT 2 268CARLFNYARGVRVWTS 2 269 IGHD1-26 85 CARHVGSGWVYFDYW 16 270CARAQYSGATECKGTLTT 10 271 CARHSLTPGFLLNYFDYW 8 272 CTRSRGLSGTYYNPDNDYW 7273 CARAQYSGSYRMQRYFDYW 7 274 CARHVKVLGATVGFDYW 3 275 CARAQYSGSYRCKGTLTT3 276 CTRSRGLSGTTTIQIMTT 2 277 CARPSIVGATECKGTLTT 2 278 CARHVAVAGSTLTT 2279 CARAQYSGSYRMQRYLTT 2 270 CARAQYSGSYRMQGTLTT 2 281 CAHTHRSVGATA 2 282IGHD6-13 73 CAKVTHAYSSTWYHGDYYYYGMDVW 28 283 CARGHLPYSSTDKGHWFDPW 16 284CARDSSHGYSSSWPDYW 4 285 CAKLPMRIAAPGTMGTTTTTVWTS 3 286 CARDSSTGIAAAGPTT2 287 IGHD2-15 46 CLASRPLWFGDPNGSTP 5 288 CAKDSSRYCSGGSCKYFDYW 4 289CAKNPASTGYGSFDYW 3 290 CAKIRPVLVTEALTI 3 291 CAKDSSRYCMVVAANTLTT 3 292CARLILGYCSGVGCTPT 2 293 CARDRGSGGSCYVLTT 2 294 CARDGVVVVLLLLTT 2 295CANWARVVVASGTTTTWTS 2 296 CAKNPPVLVTEALTI 2 297 CAKNPAVLVTEALTI 2 298IGHD3-3 25 CASKKKFLEWPETTTTTVWTS 6 299 CARKEFLEWPETTTTTVWTS 5 300CAKDINPDYDFWSGSHLPYDAFDIW 5 301 CARKKNFLEWPETTTTTVWTS 2 302CAKDINPDYDFWSGSHLPYDALIS 2 303 CARDARYCSSTSCYSFPYWYFDLW 2 304 IGHD2-2 18CARDARYCSSTSCYSFPTGTSIS 2 305 CARDARYCSSTSCYRFPYWYFDLW 2 306CARRLGRVATTYYMDVW 5 307 IGHD5-12 15 CARRLVEWLRPTTWTS 3 308CARRVGVEWLRPTTWTS 2 309 CTRDIVLTTPREWYFDLW 5 310 IGHD2-8 13CTRDIVLTTPGSGTSIS 3 311 CARDLIYGDYPTTTWTS 4 312 IGHD4-17 5CARGAAPGVETGSTP 264 313 (unidentified) 989 CARHTLFSDSSAPPRGVYYYYYMDVW 60314 CARGATVGVETGSTP 52 315 CANWAGVTGTVPLTT 41 316 CARQKSATVWTS 33 317CARAGIQLEVFTLTT 21 318 CARLDGSGGRGTALTT 20 319 CARRKRELLWVTTTTTTWTS 19320 CANPDLISAMFDDYW 14 321 CARHQCSGEACFYYYGMDVW 13 322 CANPTSFRQCSMTT 12323 CARGGTIPFPWTS 9 324 CARAQGGAHTTLTT 9 325 CTRDTGSSAGATDLW 8 326CAKAVAVTGSHFDYW 8 327 CARGAAPGVETGSTL 7 328 CARAQGRGTYYFDYW 7 329CARAIIRYFND 7 330 CAIPPDGSRRSPLTT 7 331 CVREGFCGAHGCYSLTYW 6 332CARHTLFSDSSAPPRGSTTTTTWTS 6 333 CARGYSSASVMLLIP 6 334 CVREGFVVLMAVILLPT5 335 CARSGPRGLTT 5 336 CARHSLTPGFLLNYFDYW 5 337 CARGWELDRW 5 338CARDPSSLYYYYYGMDVW 5 339 CARAIIHISMT 5 340 CANPDSFRQCSMTT 5 341CAIPRTEGRRSPLTT 5 342 CARTFGDSAALIS 4 343 CARLFNYAREYGMDVW 4 344CARHTLFSDSSAPPTGGLLLLLHGRL 4 345 CARHTLFRIVVPLPGGSTTTTTWTS 4 346CARDLGESSSTTLTT 4 347 CARAIIRYFNDW 4 348 CAPGGLRLGVETGSTP 4 349CARTLDYGIATGSIIMVWTS 3 350 CARRLGRVARPTTWTS 3 351 CARLFNTPGSTVWTS 3 352CAIPPTEGRRSPLTT 3 353 CAIPPDGRQTVPFDYW 3 354 CTRDTGSPPEPLTS 2 355CATSRGGRGTT 2 356 CATSGGVGDY 2 357 CATPTSFRQCSMTT 2 358 CATAAGLWSSSTTWTS2 359 CATAAGLWSSKYYMDVW 2 360 CASKKSATVWTS 2 361 CARVGSSTMLLIS 2 362CARVGSSPMLLIS 2 363 CARVAARPMLLIS 2 364 CARTFVIRLLLIS 2 365CARRLGRVLRPTTWTS 2 366 CARRKKGSCYRVTTTTTTWTS 2 367 CARLFNYARSTVWTS 2 368CARHVGMAGSTLTT 2 369 CARHTLFSDSSAPPRGGLLLLLHGRL 2 370CARHTLFSDSSALPRGVYYYYYMDVW 2 371 CARHTLFSDSSALPGGSTTTTTWTS 2 372CARHQCSGEACFYYTAWTS 2 373 CARGYSMASVMLLIP 2 374 CARGWELDR 2 375CARGQMGATTLIDYW 2 376 CAREWPTGTRGMW 2 377 CAREWPTGTRGMC 2 378CAREWPTGNQRGCG 2 379 CARDSTQTT 2 380 CARAQYGGATECKGTLTT 2 381CARAQGAGAHTTLTT 2 382 CARAKYDISMT 2 383 CARAIYDISMT 2 384 CARAIIRYSMT 2385 CARAHGAGAHTTLTT 2 386 CANWPGVTGTVPLTT 2 387 CAKRWGSSSWTT 2 388CAKIRPVLVTEALTI 2 389 CAKDLHSYGYLGAFDIW 2 390 CAIPPDGRQTSPLTT 2 391CAIPPDGRQTAPLTT 2 392 CAHTHRSVGALP 2 393 CAGVAPRLTT 2 394 IgE IGHD4-173475 CARGFDGGWEHW 3103 395 CARGFLMVAGEHW 113 396 CARGFLMVAGST 25 397CARGFDGGWGAL 21 398 CARGFDGGWEH 17 399 CARGFDGAGST 12 400 CARGFDGGWEYW10 401 CARGFDGGWEHR 10 402 CAGGFDGGWEHW 9 403 CARGLMVAGST 8 404CARGFDGGWST 8 405 CARGFDGGREHW 8 406 CVRGFDGGWEHW 7 407 CARGLDGGWEHW 7408 CARGFDGGWGHW 7 409 CARGFDGGWGALG 6 410 CARGFDGAGEHW 6 411CARGSDGGWEHW 5 412 CARGFGGGWEHW 5 413 CARGFDGSWEHW 5 414 CARGFDGGWERW 5415 CARGFDDGWEHW 5 416 YARGFDGGWEHW 4 417 CARGFDGGWIKHW 4 418CARGFDGGSGAL 4 419 CARGFVWWLGGT 3 420 CARGFDSGWEHW 3 421 CARGFDRWLGAL 3422 CARGFDGGWVHW 3 423 CARGFDGGWEAL 3 424 CARGFDGDWEHW 3 425 CARGFDGAGSM3 426 CARSFDGGWEHW 2 427 CARGFLMVAGEHG 2 428 CARGFLMVAGEH 2 429CARGFDVAGST 2 430 CARGFDGGWEHS 2 431 CARGFDGGWEHG 2 432 CARGFDGGCEHW 2433 CARGFDGAGEH 2 434 IGHD1-7 3 CARGFDGGWEHW 3 435 (unidentified) 166CARGFDGGWEHW 124 436 CARGFLMVAGEHW 14 437 CARGFDGGWEHS 5 438CARGFLMVAGST 4 439 CARGFVWWLGGT 2 440 CARGFLMVAGSTG 2 441 CARGFDGGSGAL 2442 CARGFDGAGST 2 443 IgG IGHD3-10 60 CARGRYAGGVIITALTP 13 444CARLPRMVRGNWFHP 8 445 CARGAWAVRGVISWAGSTP 6 446 CSREVGRDYYGSGVIEITWTS 4447 CAGSGSGSLLTTVWTP 4 448 CSREVGRDYYGSGSYRNYMDVW 3 449CVSITNSLLWFGELLIFDCW 2 450 IGHD3-22 59 CAKITSMIVVLIPTMMLLMS 20 451CARGSRARFSSDTSGYQYFDYW 4 452 CARGVYLYYDSHAYSVLTT 3 453 CARVNYYDSVVLTT 2454 CARVNYYDSSRIDYW 2 455 CARLPPFNNDDSSSYALYLTT 2 456CARHSNYYYDTSGYRVLDAFDIW 2 457 CARGGMDSYGYFYVGHYDYW 2 458 CARDPDF 2 459CAKITSMIVVLTPTMMLLMS 2 460 IGHD6-13 34 CTRQEESSAAGTGGTSSP 7 461CALCPTPIAAAGSVTT 7 462 CATSEGDPVAAAGTKSWFDSW 3 463 CARLALLYGSSRYGATLTT 2464 CARGPSSTWYSFDYW 2 465 IGHD2-15 31 CAKKEFILVVVITMMSLLMS 6 466CAKDMTAKACSDYW 3 467 CARVMGCRGGRCDFRAFDIW 2 468 CARRFCSGGICYFLTT 2 469CALTGLNGRSCYSELLIS 2 470 CAKEGVYFSGGNHYDVAFNVW 2 471 IGHD3-3 28CARPSRCCYSGGGRLTL 4 472 CAHSVGFILDFWSGYQNNWFDPW 4 473 CARPSRCCYVRGGRLTL2 474 CAMGPTIFGVVFLGSLTS 2 475 CAHSVGFILDFWSGYQNNWFDPG 2 476CAHSVGFILDFGVVIRTTGSTP 2 477 IGHD6-25 21 CARVKGGIAGMAWTS 19 478 IGHD5-521 CARGVDTTMVRSTTLTT 7 479 CARQDPYCSTSNCTMGGAMTLTT 5 480 IGHD5-24 21CARTDGIRDGYNLHRVLTT 2 481 CARTDAIRDGYNLHRVFDYW 2 482 CARGKRDAYNYYSHLDSW2 483 CARGKEMPTITTLILTP 2 484 IGHD3-16 19 CVRQSPLDDVWGVFAPVGSTP 11 485CVRQSPLDDVWGVFAPVGSTL 2 486 IGHD4-17 14 CARHPKPPTVTSATT 2 487CAKGENTVTTGQEYW 2 488 CAKGENTVTTGQEY 2 489 IGHD3-9 12CAREGRNYDSLTGDPWFDPW 2 490 IGHD2-2 12 CASRYCTSDRCLGASGKPSFDTW 2 491CARHSLAYCSTTSCAVFDYW 2 492 CARHGFEGREVVPPAMNEYYYYYMDVW 2 493 IGHD1-7 11CARGDCTTINCNTHSDYYGLDVW 3 494 CARTVGTGTTNGYLTS 2 495CAREIVLLSTATLTPTTTVWTS 2 496 IGHD5-12 10 CARQDSGYDYGYYHNGMDVW 2 497IGHD4-23 9 CARGAGYGGNSGVRTT 6 498 IGHD2-21 9 IGHD6-19 8 CARDLGSGWFRFDP 2499 CARDLGSGWFGSTP 2 500 IGHD2-8 7 CAKSHHCTNGVCHPPRFGQRSTP 2 501IGHD1-26 6 IGHD1-20 2 (unidentified) 773 CARGATVGVETGSTP 32 502CARKGSRHGGSTP 28 503 CARQNGPSIGGGSTP 23 504 CARGATPGAETGSTP 23 505CAKDTLGGMGGLTS 13 506 CARVRVLPEGVLISLRPLGSTTITWTS 11 507CARGGPKKVVTAAHLSP 11 508 CSTLGLGPPGGQTT 10 509 CARDHYDTRGVRMLLIS 10 510CATDRDSSWGTSLTT 9 511 CARMVRGGGRTSSGYYYYYMDVW 9 512 CARDGVWDLPTTLTT 9513 CVRMGPPCQLAGRSSSLTS 6 514 CTMATVGHGLRRCFGKSTATLTS 6 515CARRGGSTVTTGTSIS 6 516 CSTLGLGPPGGLTT 5 517 CMGPGETAI 5 518CARVSMIRFRVWGLWTS 5 519 CARVQRGAVVIPTT 5 520 CARRRYNDLGAPNWVDPW 5 521CARGEDCGGGRCNNLPTTVWTS 5 522 CAKRKLAPPRKFTTLTT 5 523CATLEGGAPPDLRRAEAFLLIS 4 524 CARQDPYCSTSNCTMGGAMTLTT 4 525CARGKDCGGGRCNNVPYYGMDVW 4 526 CAKDGHKLTGTTTRTS 4 527 CAILPETQWYPRLTT 4528 CVRDLGAITPVFSTS 3 529 CVHRPRWLNVVPT 3 530 CVHRPRWLNVVPN 3 531CARSFVVKVHAHCGAVLSST 3 532 CARRLNVAVVVPAYVGWFDPW 3 533 CARLGKNHSQGVDYW 3534 CARGPGGVWDRLSLTS 3 535 CARGKDCGGGRCNNVPTTGWTS 3 536CARGFMVQASSVRLKRGQFLADSW 3 537 CARGDWGTVTLATT 3 538 CARDWEWQQRLNYFDP 3539 CARDNQPWRDARNLGGAFDVW 3 540 CARDGLRPPPFMVTIQRGGLTT 3 541CARAVGGFNSGWPSIGVPARSTP 3 542 CARAVGGFNSGWPSIGVPARSTL 3 543CAKVDETVVLPAALLTP 3 544 CAKSPKPWSQLVSTPIMPTPWTS 3 545 CVRRAAGGRSGLTT 2546 CVRPPPTVPGTAGSTP 2 547 CVRESTFYYFGPW 2 548CVRDDDYSRTWYMGQGASSDYGMDVW 2 549 CVKWVSGVLTSLTT 2 550 CVALFVPAGSTL 2 551CTMATVGHGATTLFREVHRNTDFW 2 552 CSTLGLGPRGADYW 2 553 CSRTGGRLLIS 2 554CSKVGRILKLIT 2 555 CKVAVEMVLMY 2 556 CGKFLGTTVASS 2 557CATSGRSSAWYPDVFDIW 2 558 CATNYCRGISCYPAPLTT 2 559 CATLTGGAPPDLRRAEAFLLIS2 560 CATEGTGAVTPFTT 2 561 CATAPGGTSYT 2 562 CASRPSWGSSFDFW 2 563CASRPPGAAALTS 2 564 CASMIALHHTLTS 2 565 CARYSPVDPSTLDFW 2 566CARVLDSSAHWYFDDW 2 567 CARRRYNDLGAPTGSTP 2 568 CARQNGPSIGGGSTL 2 569CARQHSEWEILRLVFDHW 2 570 CARMVREEAERRPAIIITTWTS 2 571 CARLPRMVRVTGSTP 2572 CARIDYVSTWYYDQW 2 573 CARICAEREFLSLLTP 2 574 CARGPGWGMGSTKFDCW 2 575CARGGKSATGANYHQFFDCW 2 576 CARGDCTTINCNTHSTTTVWTS 2 577 CARGATVGVETGSTL2 578 CARGATLGVETGWTP 2 579 CAREYYGILYGYYFDYW 2 580 CARDWEWQQRLNYFDPW 2581 CARDNQPWRDARNLGVHLMC 2 582 CARDHYYDERNQGPDW 2 583 CARDGGLAGTGTLEY 2584 CARAGLVLGPYGMDIW 2 585 CARAGGHGTWTS 2 586 CAKVAETLVSTGFDSYYAYSMDVW 2587 CAKTYDYGSRGFSILLIS 2 588 CAKSLRVGGDVFEIW 2 589 CAKSDYFDP 2 590CAKGRGRLVTIATTLTT 2 591 CAKGAGRRAAGKFLTT 2 592CAKAKRRSLGMQTLPTLRGRSDGFDVW 2 593 CAKAHFPGDLPSFSSIS 2 594CAKADCGTGCFIVDDW 2 595 CAHQQWRPGRRGFDYW 2 596 IgM IGHD6-13 148CARTYSSWYRGPLSP 24 597 CTRQEESSAAGTGGTSSP 16 598 CARPIAAAGSRGFGTLTT 15599 CAQRRPSSSTWYAPTLTT 7 600 CAQRRPNSSTWYAPTLTT 4 601CARDLGGYSSSWSTNYYYYMDVW 3 602 CAKVNWGIAAAGSYAFDIW 3 603CAHRVRGMTSSSWYYGTFDYW 3 604 CVRPGATAGTLLTV 2 605 CARTYSSWYRGGPLSP 2 606CARPQRYSSSWYDDYYYGMDVW 2 607 CARPIAAAGSRGVRYFDYW 2 608CARGVLAPLYSSTLKLRFSVWTS 2 609 CARGLVAAAGTRRGWFTP 2 610CARGLVAAAGTRRGWFDPW 2 611 CARDSGQIVAAVTLDYW 2 612 CARAPSIPVAGIGYHFDHW 2613 CAQRRPNSSTWYRPLTLTT 2 614 CAKVNWGYSSCWFLRFLIS 2 615 CAKEGVPIAAPGLTT2 616 CAHSRAAAGSLTT 2 617 CAHSRAAAGSFDYW 2 618 CAHRVRGNDKQQLVLWGPLTT 2619 CAHRVRGMTSSSWYYGTLTT 2 620 IGHD2-15 123 CARVEGGLTT 15 621CAKGWTAARGALNTSST 14 622 CARFWSGVGLTT 7 623 CAREVYLYCNGGRCYWRGSSP 5 624CARSEYCRGGNCYFNGYYFDSW 3 625 CARAPYCSGGSCYLFDYW 3 626CARGFVVVVTATLGTTITPWTS 2 627 CARDLCGGSCSRTTGSTP 2 628CARAGYCSGGSCYGWFDPW 2 629 CAKTKTGTTKINTTLTT 2 630 CAKNGILTGWVNGYTTLTT 2631 CAKGQTTAILGSTDFNWFDPW 2 632 CAKFHLQPSLLMVRRSPTS 2 633 IGHD3-22 117CARDPRGAVGITTGPTH 9 634 CARGSPPGAVGFIGSTP 8 635 CAKDMGGITMIVVVMISLTT 6636 CARARRGHGSTTTWTS 5 637 CARDPPRMLLIS 4 638 CARDLSYYDSSGYYAYW 4 639CATYKYYDSSGFMTT 3 640 CATYKYYDSSGFHDYW 3 641 CARDLSYYDSSGYYAYV 3 642CAHRRPYYYDSSGYYYAFDYW 3 643 CLTMIVPT 2 644 CATVTGGSSGYYYHVYYFDYW 2 645CARVRYSGGILGLPLTT 2 646 CARGRRGIVVVIPKEVRFDYW 2 647 CARGIWVSTGYYRYYFDNW2 648 CAREGYDSGGYYYEVEAFDIW 2 649 CARDAGPITTTVAGIIMRLLTF 2 650CAHRRPITMIVVVITMPLTT 2 651 IGHD5-5 89 CARGGGKDLLASYLTT 19 652CTSRGYSYGAPRWD 7 653 CARRWGRGRDTAMNLTTTTVWTS 7 654 CSRGGPGTAMVST 3 655CARRGGGGGDTAMNLTTTTVWTS 3 656 CARHGDSFVQPRRTT 3 657 CAKHDGQSNTLTA 3 658CATNTAMGFNEAVLIS 2 659 CARRGYSSMGIWDLST 2 660 CARRGGRGRDTAMNLTTTTVWTS 2661 CARQGSRLFHYYYYYMDVW 2 662 CARHKPGYSYVFLTT 2 663 IGHD3-10 78CGREGAGSAPWTS 5 664 CARHRITMVRELSYTTTWTS 5 665 CARDSDQQHGVRGVIPMAVWTS 5666 CATTPLMTLVRGLTTTWTS 4 667 CARHQVSMVRGVTRSTGSTP 4 668CARDEWFGESEVTNLDAFDIW 3 669 CAHSEGRITMVRGVIGPFDYW 3 670CARGQTYGSGPRGFDPW 2 671 CARGLYGSGSYYIKRRKTGSTP 2 672 CARAMVRGVLALTT 2673 CAKVGVGSMTMDRGVMTT 2 674 IGHD3-3 77 CARDPFGVVISTVWTS 14 675CARDRGRVLRFCPQGVPSLTT 8 676 CASQTYYDFGVVIILLTTLTT 5 677 CASLDFGVVIILTS 4678 CARHRSITILEWFVNHETGSTP 4 679 CAQSHYDFGVVIILIPGSTP 4 680CARGSPHYDFGVEIRTGSTP 3 681 CARDPLWSGYFYGMDVW 3 682 CARVGTYDFGVVMSNS 2683 CARHRSITIFGVVRKSRNWFDPW 2 684 CARDRFSLNSAFGVVEGSYWFDPW 2 685CAQSHYDFWSGYYSNTGFDPW 2 686 IGHD1-26 51 CASVGGTRGPGDPGLGT 12 687CAKGGFIVGATLTT 4 688 CAKEGGRIVGATMTT 3 689 CARVRYSGRYSRSTVDYW 2 690CARLKCGLTTCLHKTLIS 2 691 CARDSVGATTTDYW 2 692 IGHD6-19 50CAHPGSGWPLTTLTT 6 693 CARGCSVAGTGSSTP 4 694 CARARITVAAPYDYW 4 695CARLISSGWYLTT 3 696 CARTSLEQQLVFMTENSSGWSFDYW 2 697 CARGGIAVAGTRIKTTT 2698 CARDQQWLPDYV 2 699 CARARITVAAPYDY 2 700 CARARITVAAPMTT 2 701CAKGVGSGWYDFFDYW 2 702 CAKGPREQWLAPYWYFDLW 2 703 IGHD3-9 39CARGGSLVLDVLTT 17 704 CARGGSLMLDVLTT 10 705 CASGPYFDWLLTYMDVW 2 706CARGPLYDILTGPTPTTTTTWTS 2 707 CARGGSIVLDVLTT 2 708 IGHD2-8 30CAKWGGNSSWKS 7 709 CARRSWCTNGVCYYISVALVTGSTP 6 710 CARGSRYCTNGVCYFWFDPW3 711 CARDVLGYCTATACWRGGPNHYYYGMDV 3 712 W IGHD5-24 28 CARGIEMATILLTT 16713 CARGSRWLQFFDYW 3 714 CAKGGERWLQSGATTLTT 2 715 IGHD6-6 22CTRGLVIEDIAARPGGA 2 716 CASDRGVQLVQDYYFGMDVW 2 717 IGHD5-12 21CARNARGGVATIFRGSTP 8 718 CARIQVATIDPKPKRLPSVWTS 2 719 IGHD4-17 19CARDWNGDYDYYYYGMDVW 6 720 CARDWNGDYTTTTTVWTS 2 721 IGHD2-2 16CARDRSSTSCCHFDYL 2 722 IGHD2-21 13 CARGPAYCGSDCYSYFQHW 2 723 IGHD4-23 9CARGGDYGGTPLTT 6 724 IGHD1-7 6 CARDGPPRITGTTEVTT 3 725 IGHD1-1 6CARRVGASGTSIS 4 726 IGHD3-16 4 CARAHYDYVWGSYRSPPTT 2 727 IGHD1-20 4IGHD4-4 3 CARPVTTGTHRGYFDLW 2 728 (unidentified) 834 CASVGGTRVPGDPGLGT35 729 CAHLTITFGEFSERMLSTS 29 730 CARLGYYDRRTT 27 731 CAGEVVIWNSMTT 18732 CARGARGDNSTMT 15 733 CARGGSRWPRTTLTT 13 734 CARMGGPPTGTSIS 12 735CVRGGLYTIPT 11 736 CARGGCGNYCPTTTSWTS 11 737 CARRDSSRGTTLTT 10 738CARTTGTTTTTTWTS 8 739 CARLSRYSNSPPSLTT 8 740 CARHLGVRGPWALFIS 7 741CARDPPRMLLIS 7 742 CAKGDIVTT 7 743 CARGGGVSSRRITSTP 6 744 CAREGVRSLTT 6745 CAKDKTYDTHGYSPF 6 746 CASLLLPTVTGGVLLIS 5 747 CARDYGATGSLDC 5 748CARDFGSGGVLITWPS 5 749 CARYPGIEVTGTGALTT 4 750 CARRGDVGNYCPTTTSWTS 4 751CARLPGITTTTTTWTS 4 752 CARHVKPVDGNAYYEDSV 4 753 CARGTRGISEPTKFDYW 4 754CARGGPERQLDDS 4 755 CAHRRPDSSTWYAPTLTT 4 756 CVSRRQTTPTSTVGPS 3 757CVRKEVMYFDP 3 758 CGDTLGETMPVTA 3 759 CATRRGQFWTT 3 760CARVVGGGVTTTTTVWTS 3 761 CARVLLSGSTWYAEYFQSW 3 762 CARTLSATGDNWFGPW 3763 CARTGARGDNSTMTS 3 764 CARQTPGTLQTTTTTTVWTS 3 765 CARPRYDYGLLLIS 3766 CARLTRRTTVVPRTSTT 3 767 CARHVKPVDGNAYYEDSW 3 768 CARHRGVRGPWALFIS 3769 CARGSPPGAVGFIGSTP 3 770 CARGLSSSRSLSSTP 3 771 CARGGATPGG 3 772CAREVPTGPRTSTTVWTS 3 773 CARDPRADYLAFDIW 3 774 CANGDTARPTGTLAT 3 775CAKAPSDTIIVHGPQHLTT 3 776 CAARGRTTLTT 3 777 CVRGSGRTGEAT 2 778CVREARTPATTYGWYYYDYW 2 779 CVRDNSWSSRDAERYYYNMDVW 2 780CVRDLAWRTQQLLSENWFDPW 2 781 CVRDLAWRTQQLLSEIGSTV 2 782CVRDLAWRTEQLLSENWFDPW 2 783 CVRDLAWRTEELLSENWFDTW 2 784CVRDLAWRTEELLSENWFDPW 2 785 CVRDLAWRTEELLSEIGSTL 2 786 CVHRPRWLNIVANV 2787 CTWWQQLGEFLTS 2 788 CTSLTSMVNFMLLMS 2 789 CTRQEESSAAGTGGTSSP 2 790CTRDGVRGDLNPTLNV 2 791 CMRHQHQRPRTT 2 792 CITDCTGGSCDFAGPGEYW 2 793CATYYYKLVVIDTLTT 2 794 CATGAATVLLTT 2 795 CASRPGHHSGPLTT 2 796CASRPGHHSGPFDYW 2 797 CASPVGGGET 2 798 CARWPPIQGELLIS 2 799CARVRSGLLPTTTTTWTS 2 800 CARVQLIGDSGYRPWTT 2 801 CARVLRGPTTLTT 2 802CARQWGIRGVALTT 2 803 CARPRYDLRFCLLIS 2 804 CARNTEATTT 2 805CARMPGKEIAMADLATLTT 2 806 CARLTRRTTVGTPDIDYV 2 807 CARHVKPVDGNAYYEDS 2808 CARHLVW 2 809 CARHDPVPQFKHGWTS 2 810 CARGGPGRQLTMT 2 811CARGGGKDLLASYLTT 2 812 CARGARSGSSMTA 2 813 CARDYGATGSLDCW 2 814CARDVIGAAASYVAFDIW 2 815 CARDEWFGSPKSRTLMLLIS 2 816 CARAQNWDLLTGTSIS 2817 CARAPSIPVAVSATTLTT 2 818 CAKHDGQSNTPDCW 2 819 CAKGWTAARGALNTSST 2820 CAKGPPVVTTLDTSST 2 821 CAKDRGGS 2 822

(Table 2-4) Comparison of D repertoires among classes, vertical axis:frequency (%), horizontal axis: gene name

TABLE 2-4 SEQ ID gene name num reads CDR3 amino acids num reads NO:IGHA1 IGHD3-22 35 CARRPIPPLTMRVVVIPLTS 5 823 CARDPPMPMIVVQTLTT 2 824CARDPPMPMILVQTLTT 2 825 CAKILITMILVVSLMLLIS 2 826 IGHD6-13 27CGRSRHSSSWQILTP 11 827 CANGGLAAAGDHLTT 5 828 CARAPSIPVAGIGYHFDHW 3 829CARAPSIPVAGIATTLTT 2 830 IGHD3-10 26 CAREIRGTTMVRELTTSTATWTP 6 831CVRTYYFGLGDIITEITSTVWTS 3 832 CARRTYYYGSTNLTT 2 833 CARMVTDYYGSGNRGWFDPW2 834 CARDYYGSGVMTL 2 835 IGHD5-5 19 CMGPGDTAI 7 836 CARRPREMESAMVLSLTT2 837 IGHD3-9 19 CAHSAPYYDILSRNRARSWKDFDNW 3 838CAHSAPIMIFCLVTAHEVGRILTT 3 839 CATVALLRYFDWSSTR 2 840 IGHD3-3 17CTRRGGVVIICLTT 2 841 CIHTGNDFWTGTNYGLTS 2 842 CAKDRFSGRGRFEFMEWLTPLTT 2843 CAKDRFSEGKVQFMEWLTPLTT 2 844 IGHD6-19 14 CARGRRSLPAIVYSSGPDRPNWFDPW6 845 CTSAAVASSSGWPLRGVWTS 3 846 IGHD2-2 14 CARAPLCSSASCHLQLDYW 5 847IGHD1-26 13 CARDDSASYSRGTT 3 848 IGHD2-21 12 CARSHIVVVTAIPLEMLLIS 2 849IGHD4-23 11 CARGAGYGGNSGVRTT 9 850 IGHD2-15 11CAKDLAPLKSCSRGGCYPYYYGLDIW 3 851 IGHD4-17 9 CARTLYGDFVDF 2 852 IGHD6-6 5IGHD5-12 4 CARHVNGYDYLFPFTSW 3 853 IGHD3-16 4 CAKGVLSSGGVIATLPGSTP 3 854IGHD1-1 3 CARGGSQLERRRPLVTT 3 855 IGHD5-24 2 IGHD2-8 2 (unidentified)531 CARETVGGTLTT 19 856 CARTSSGHDPPIITGWTS 13 857 CARSPIWFGSHRFTTTWRS 9858 CARDPLETGATSLII 9 859 CAKLGNRPGFTEWDHWFGPW 9 860 CVRDPHETGATTLIT 6861 CARIRKEVGAPPITWTS 6 862 CARGSWSGAAFYSLTT 6 863 CARDPNKFRTNHLSTT 6864 CATVPELTDISLPRLMALIS 5 865 CARVWGKHTLTT 5 866 CARDPNKFRPNHLSTT 5 867CARAGRELLRALMTT 5 868 CARAGAELLRALMTT 5 869 CARAEDYYDTEGYFYLTP 5 870CAHRTNYSTNRYGAFTTLTS 5 871 CVRDPQETGATTLIT 4 872 CATQCLGGAGLTTTTAPWTS 4873 CARRTYYSGSTNLTT 4 874 CARRTTRETGSSIS 4 875 CVRQYGLGSGSLTP 3 876CVKIRNLIGFTGSTP 3 877 CTRDGVRGDLNPTLNV 3 878 CDKAKVTADLRT 3 879CATVPELPDISLPRLMALIS 3 880 CATVFGRRYRLLTT 3 881 CARYRAAYPRRAWTS 3 882CARTIGFEIAMTGGLGALTP 3 883 CARRDPPVRASLSTTLTS 3 884CARFQRYCRGGSCSATLDAFDKW 3 885 CARDLGERRDGEPMWFDAW 3 886 CARDLAVWATLTT 3887 CAKDVEPTVTLYNHFDP 3 888 CAKDFNWEGIT 3 889 CAHRTNYSTNRYGGLYYFDFW 3890 CVRGVGTILWLTI 2 891 CVRDAGPGGSLTS 2 892 CTTGFSGSTACHWDHTACHWDDAFAMW2 893 CTHAVESLLGTTSTS 2 894 CIHTGNDFGPGPTMVWTS 2 895 CGVGRGDNDVDFKFKW 2896 CATRESPLTT 2 897 CATAGIELWRAGSTP 2 898 CARYRIAMATSPYFDYW 2 899CARTNFGSGGYILGDTTMVWTS 2 900 CARSAGYLHRRTS 2 901 CARRTYYSGSTNFDYW 2 902CARRDLPFGASLSTTLTS 2 903 CARPGFSYGPRLTP 2 904 CARKKIPTAGYSSLTT 2 905CARGSWMGRPFISLTT 2 906 CARGLRWADN 2 907 CARGGTSGLILDTTSTPWTS 2 908CAREMHIDSLTVGRAFDIW 2 909 CARDVPDIYSSGATDC 2 910 CARDPSYLPTPALKT 2 911CARDPNKFRPNHFVDYW 2 912 CARDLGTTNYWLDTW 2 913 CAKQRASGNSLTI 2 914CAKEPKIVGRRRTTLIT 2 915 CAKDLGVCSEGAASSLVLIS 2 916CAHSAPYYDICLVTAHEVGRILTT 2 917 CAGLIGRFIPLTT 2 918 IGHA2 IGHD2-21 62CAKDMCGLWASCGGDCYSRRTTSLTT 41 919 CAKDMCGLWASCGGDCYSRRTASLTT 5 920CARGPNMAFVVVTAILMLLIS 4 921 CARAPDCGGSTCYSHPYYGMDVW 4 922CARDPRIVVVAPATHTPTTVWTS 2 923 IGHD3-3 20 CIYDFWSGGPHPTLTT 11 924CARIVNTEGFWSGFLTP 4 925 CARIVNTEGFGVVFLTP 2 926 IGHD2-15 18CARAPDCGGGTCYSHPTTVWTS 5 927 CARARIVVVVPATLTPTTVWTS 3 928CARAPDCGGGTCYSHPYYGMDVW 3 929 IGHD6-13 14 CALCPTPIAAAGSVTT 5 930CALCPNPYSSGWFCNYW 3 931 IGHD1-26 9 CARGPATAILGATPSLTP 3 932CVRHDYSDNDLSTNWFGP 2 933 IGHD3-10 7 CATYYYGSGSAGHNFDYW 2 934CARGPGLSVMIRGVITTPNHILIT 2 935 IGHD2-8 6 CANVGGADRNYCINGVRHNPNYLTT 5 936IGHD3-16 5 CARGFGARGVILT 2 937 IGHD5-12 1 CVLSRGLVATRTLDYW 1 938IGHD4-17 1 CARTLYGDFVDSL 1 939 IGHD3-22 1 CARDKQESSGSPRNYYFDYW 1 940(unidentified) 131 CAKGHQVRLRGRTGTSIS 11 941 CAGAPDCGVGAAPLTSTTVCTS 8942 CAGIGGATSTTTTTTWTS 6 943 CARRAAPHDYGHVLIF 5 944 CVRHDGSFTKTGSTP 4945 CVKIGAAH 4 946 CARLRCSNDNCAGHLYYYFSGLDIW 4 947 CARASLPRGLLI S 4 948CTKGGGRKTAGKFLTP 3 949 CARTARTGDL 3 950 CARIGHEFYSLTYSVNDVFDLW 3 951CAKGRGRRAAGKFLTT 3 952 CAKGAGRRAAGKFLTT 3 953 CVRFIGAYSNNWYPGYFDYW 2 954CASQSQNYYYYYMDVW 2 955 CASKKEILWAGPNLTT 2 956 CARVRCGLVASEGVLIS 2 957CARRAAPMTTGMFLIF 2 958 CARLRGGFPPVVKRVEVFLLTS 2 959 CARGRFARGGDDSLIS 2960 CAKAPGDLCRSTP 2 961 CAGIRGSNIYYHYYYMDVW 2 962 CADLPGIIGGEIT 2 963IGHG1 IGHD3-22 52 CAKITSMIVVLIPTMMLLMS 20 964 CARGSRARFSSDTSGYQYFDYW 4965 CARGVYLYYDSHAYSVLTT 3 966 CARVNYYDSVVLTT 2 967 CARVNYYDSSRIDYW 2 968CARLPPFNNDDSSSYALYLTT 2 969 CARHSNYYYDTSGYRVLDAFDIW 2 970CARGGMDSYGYFYVGHYDYW 2 971 CAKITSMIVVLTPTMMLLMS 2 972 IGHD3-10 35CARLPRMVRGNWFHP 8 973 CARGAWAVRGVISWAGSTP 6 974 CAGSGSGSLLTTVWTP 4 975CVSITNSLLWFGELLIFDCW 2 976 IGHD6-13 31 CTRQEESSAAGTGGTSSP 7 977CALCPTPIAAAGSVTT 7 978 CATSEGDPVAAAGTKSWFDSW 3 979 CARLALLYGSSRYGATLTT 2980 CARGPSSTWYSFDYW 2 981 IGHD3-3 20 CAHSVGFILDFWSGYQNNWFDPW 4 982CAMGPTIFGVVFLGSLTS 2 983 CAHSVGFILDFWSGYQNNWFDPG 2 984CAHSVGFILDFGVVIRTTGSTP 2 985 IGHD3-16 19 CVRQSPLDDVWGVFAPVGSTP 11 986CVRQSPLDDVWGVFAPVGSTL 2 987 IGHD5-5 18 CARGVDTTMVRSTTLTT 7 988CARQDPYCSTSNCTMGGAMTLTT 5 989 IGHD5-24 14 CARTDGIRDGYNLHRVLTT 2 990CARTDAIRDGYNLHRVFDYW 2 991 IGHD3-9 11 CAREGRNYDSLTGDPWFDPW 2 992 IGHD1-711 CARGDCTTINCNTHSDYYGLDVW 3 993 CARTVGTGTTNGYLTS 2 994CAREIVLLSTATLTPTTTVWTS 2 995 IGHD4-17 10 CARHPKPPTVTSATT 2 996 IGHD5-129 CARQDSGYDYGYYHNGMDVW 2 997 IGHD2-2 9 CARHSLAYCSTTSCAVFDYW 2 998CARHGFEGREVVPPAMNEYYYYYMDVW 2 999 IGHD2-15 9 CALTGLNGRSCYSELLIS 2 1000IGHD4-23 7 CARGAGYGGNSGVRTT 6 1001 IGHD1-26 5 IGHD2-21 4 IGHD2-8 3(unidentified) 444 CARGATVGVETGSTP 32 1002 CARKGSRHGGSTP 28 1003CARQNGPSIGGGSTP 23 1004 CARGATPGAETGSTP 23 1005 CAKDTLGGMGGLTS 13 1006CARVRVLPEGVLISLRPLGSTTITWTS 11 1007 CATDRDSSWGTSLTT 9 1008CARRGGSTVTTGTSIS 6 1009 CARQDPYCSTSNCTMGGAMTLTT 4 1010 CAILPETQWYPRLTT 41011 CVHRPRWLNVVPT 3 1012 CVHRPRWLNVVPN 3 1013 CARLGKNHSQGVDYW 3 1014CARGFMVQASSVRLKRGQFLADSW 3 1015 CARGDWGTVTLATT 3 1016CARDNQPWRDARNLGGAFDVW 3 1017 CARDGLRPPPFMVTIQRGGLTT 3 1018CARAVGGFNSGWPSIGVPARSTP 3 1019 CARAVGGFNSGWPSIGVPARSTL 3 1020CAKSPKPWSQLVSTPIMPTPWTS 3 1021 CVRESTFYYFGPW 2 1022CVRDDDYSRTWYMGQGASSDYGMDVW 2 1023 CVKWVSGVLTSLTT 2 1024CATSGRSSAWYPDVFDIW 2 1025 CATNYCRGISCYPAPLTT 2 1026 CASMIALHHTLTS 2 1027CARYSPVDPSTLDFW 2 1028 CARVLDSSAHWYFDDW 2 1029 CARQNGPSIGGGSTL 2 1030CARQHSEWEILRLVFDHW 2 1031 CARLPRMVRVTGSTP 2 1032 CARIDYVSTWYYDQW 2 1033CARICAEREFLSLLTP 2 1034 CARGDCTTINCNTHSTTTVWTS 2 1035 CARGATVGVETGSTL 21036 CARGATLGVETGWTP 2 1037 CAREYYGILYGYYFDYW 2 1038CARDNQPWRDARNLGVHLMC 2 1039 CARDGGLAGTGTLEY 2 1040 CARAGLVLGPYGMDIW 21041 CAKVAETLVSTGFDSYYAYSMDVW 2 1042 CAKTYDYGSRGFSILLIS 2 1043CAKGAGRRAAGKFLTT 2 1044 CAKAKRRSLGMQTLPTLRGRSDGFDVW 2 1045CAKADCGTGCFIVDDW 2 1046 IGHG2 IGHD3-10 24 CARGRYAGGVIITALTP 13 1047CSREVGRDYYGSGVIEITWTS 4 1048 CSREVGRDYYGSGSYRNYMDVW 3 1049 IGHD2-15 22CAKKEFILVVVITMMSLLMS 6 1050 CAKDMTAKACSDYW 3 1051 CARVMGCRGGRCDFRAFDIW 21052 CARRFCSGGICYFLTT 2 1053 CAKEGVYFSGGNHYDVAFNVW 2 1054 IGHD6-25 21CARVKGGIAGMAWTS 19 1055 IGHD6-19 8 CARDLGSGWFRFDP 2 1056 CARDLGSGWFGSTP2 1057 IGHD3-3 8 CARPSRCCYSGGGRLTL 4 1058 CARPSRCCYVRGGRLTL 2 1059IGHD5-24 7 CARGKRDAYNYYSHLDSW 2 1060 CARGKEMPTITTLILTP 2 1061 IGHD4-17 4CAKGENTVTTGQEYW 2 1062 CAKGENTVTTGQEY 2 1063 IGHD3-22 4 CARDPDF 2 1064IGHD2-8 4 CAKSHEICTNGVCHPPRFGQRSTP 2 1065 IGHD6-13 3 IGHD5-5 3 IGHD2-213 IGHD2-2 3 CASRYCTSDRCLGASGKPSFDTW 2 1066 IGHD4-23 2 IGHD1-20 2(unidentified) 317 CARGGPKKVVTAAHLSP 11 1067 CSTLGLGPPGGQTT 10 1068CARDHYDTRGVRMLLIS 10 1069 CARMVRGGGRTSSGYYYYYMDVW 9 1070 CARDGVWDLPTTLTT9 1071 CTMATVGHGLRRCFGKSTATLTS 6 1072 CVRMGPPCQLAGRSSSLTS 5 1073CSTLGLGPPGGLTT 5 1074 CMGPGETAI 5 1075 CARVSMIRFRVWGLWTS 5 1076CARVQRGAVVIPTT 5 1077 CARRRYNDLGAPNWVDPW 5 1078 CARGEDCGGGRCNNLPTTVWTS 51079 CAKRKLAPPRKFTTLTT 5 1080 CATLEGGAPPDLRRAEAFLLIS 4 1081CARGKDCGGGRCNNVPYYGMDVW 4 1082 CAKDGHKLTGTTTRTS 4 1083 CVRDLGAITPVFSTS 31084 CARSFVVKVHAHCGAVLSST 3 1085 CARRLNVAVVVPAYVGWFDPW 3 1086CARGKDCGGGRCNNVPTTGWTS 3 1087 CARDWEWQQRLNYFDP 3 1088 CVRRAAGGRSGLTT 21089 CVRPPPTVPGTAGSTP 2 1090 CVALFVPAGSTL 2 1091CTMATVGHGATTLFREVHRNTDFW 2 1092 CSTLGLGPRGADYW 2 1093 CSRTGGRLLIS 2 1094CSKVGRILKLIT 2 1095 CKVAVEMVLMY 2 1096 CGKFLGTTVASS 2 1097CATLTGGAPPDLRRAEAFLLIS 2 1098 CATEGTGAVTPFTT 2 1099 CATAPGGTSYT 2 1100CASRPSWGSSFDFW 2 1101 CASRPPGAAALTS 2 1102 CARRRYNDLGAPTGSTP 2 1103CARMVREEAERRPAIIITTWTS 2 1104 CARGPGWGMGSTKFDCW 2 1105 CARGPGGVWDRLSLTS2 1106 CARGGKSATGANYHQFFDCW 2 1107 CARDWEWQQRLNYFDPW 2 1108CARDHYYDERNQGPDW 2 1109 CARAGGHGTWTS 2 1110 CAKSLRVGGDVFEIW 2 1111CAKSDYFDP 2 1112 CAKGRGRLVTIATTLTT 2 1113 CAKAHFPGDLPSFSSIS 2 1114CAHQQWRPGRRGFDYW 2 1115

The above results show that the analysis technique of the presentinvention can calculate and materialize quick analysis in severalminutes.

Analytical Test Example 2 Comparison of BCR Repertoire Among Specimens

The present Example compared BCR repertoires among specimens.

(Materials and Methods)

(Materials)

For a read set of 5 specimens obtained by the same technology asAnalysis Example 1, 4 specimens (No. 1-4) are healthy individuals andone specimen (No. 5) is a leukemia patient.

(Method)

A repertoire was derived out for each class and each region for eachsample by the same method as Analysis Example 1 and compared amongspecimens.

(Results)

As an example of results, FIGS. 36 (A and B) show results of comparing Vrepertoires in IgM and FIG. 37 shows results of comparing J repertoires.It is demonstrated that only specimen No. 5 is significantly different.

Analytical Test Example 3: Comparison of TCR Repertoires of HealthyIndividuals

The present Example compared TCR repertoires of healthy individuals.

(Materials and Methods)

(Materials)

For a read set of 10 specimens obtained by the same technology asExample 1, 10 specimens (No. 1-10) are all healthy individuals.

(Methods)

A repertoire was derived out for each class and each region for eachsample by the same method as Example 1 and compared among specimens.

(Results)

Results are shown in FIGS. 38-41. FIGS. 38 (A-D) show results ofcomparing TRAV repertoires among specimens. FIGS. 39 (A-D) show resultsof comparing TRBV repertoires among specimens. FIGS. 40 (A-D) showresults of comparing TRAJ repertoires among specimens. FIG. 41 showsresult of comparing TRBJ repertoires among specimens.

Each of the results was able to be obtained in about several minutes.

The present analysis method can materialize analysis of a C region,which was not provided by High-V-QUEST that has been commonly used. Theadvantage of the present system includes that “unit of gene name” or“unit of allele” can be selected for each region. Although not wishingto be bound by any theory, this is because such a selection is notmaterialized with High-V-QUEST that has been commonly used. The(current) issue in High-V-QUEST method is in insufficient classificationof D regions, while the present system can be considered as solving suchan issue. Specifically, database content for a D region is insufficientin High-V-QUEST, and therefore D region sequences that are not similarto a DB record would all be swept under “no hit”. In contrast, thesystem of the present system can utilize a CDR3 sequence instead of a Dgene name/allele as a classification category. Thus, classification asfar as currently possible can be performed. The present system can beused without any limit in the number of sequences. Although not wishingto be bound by any theory, this is due to consideration such thatfurther deep sequencing to search for a rare clone, when performed, canbe analyzed without any change. Instead, a feature of limiting thenumber of analysis jobs processed simultaneously (when full,automatically processed later), i.e., a job queue style managementfunction, is introduced to prevent depletion of computational resources.The disadvantage of High-V-QUEST, which has a limited number of maximumsequences, is overcome thereby.

(Examples of Analysis System)

(Example 1 of analysis system: Diagnostic application in T-cell largegranular lymphocyte leukemia (T-LGL))

The present invention performed an experiment to confirm application ofthe system of the present invention in diagnosis of T-cell largegranular lymphocyte leukemia (T-LGL).

Sample: Peripheral blood mononuclear cells derived from T-cell largegranular lymphocyte leukemia Method

(RNA Extraction)

7 mL of whole blood was collected from one patient suffering from T-celllarge granular lymphocyte leukemia in a heparin-containing bloodcollection tube. Peripheral blood mononuclear cells (PBMC) wereseparated by ficoll density gradient centrifugation. Total RNA wasextracted/purified from the isolated 1.66×10⁷ PBMCs by using an RNeasyLipid Tissue Mini Kit (QIAGEN, Germany). The resulting RNA wasquantified by absorbance of A260 by using an absorption spectrometer.The amount of total RNA was 15 μg.

(Synthesis of Complementary DNA and Double Stranded Complementary DNA)

The extracted RNA sample was used to carry out an adaptor-ligation PCR.First, in order to synthesize a complementary DNA, a BSL-18E primer(Table 3-1A) and 3.5 μL of RNA were admixed and annealed for 8 minutesat 70° C. After cooling on ice, a reverse transcription reaction wasperformed in the presence of an RNase inhibitor (RNAsin) to synthesize acomplementary DNA with the following composition.

TABLE 3-1A Synthesis of complementary DNA Regent Content (μL) Finalconcentration RNA solutin 3.5 200 μM BSL-18E 1.5  30 μM Total 5 70° C.,8 minutes 5× First strand buffer 2 50 mM Tris-HCl, pH8.3, 75 mM KCl,  3mM MgCl₂ 0.1M DTT 1 10 mM 10 mM dNTPs 0.5 500 μM RNasin (Promega) 0.5  2U/μL Superscript III ™, 1 20 U/μL 200 U/μL (Invitrogen)

The complementary DNA was subsequently incubated for hours at 16° C. inthe following double-stranded DNA synthesis buffer in the presence of E.coli DNA polymerase I, E. coli DNA Ligase, and RNase H to synthesize adouble stranded complementary DNA. Furthermore, T4 DNA polymerase wasreacted for 5 minutes at 16° C. to perform a 5′ terminal bluntingreaction.

TABLE 3-1B1 Synthesis of complementary DNA Regent Content (μL) Finalconcentration complementary DNA 9 reaction solution Sterilized water46.5 5× Second strand buffer 15 25 mM Tris-HCl, pH7.5, 100 mM KCl, 5 mMMgCl₂, 10 mM (NH₄)SO₄, 0.15 mM β- NAD+, 1.2 mM DTT 10 mM dNTPs 1.5 0.2mM E. coli DNA ligase, 10 0.5 0.067 U/μL

TABLE 3-1B2 U/μL (Invitrogen) E. coli DNA 2 0.27 U/μL polymerase, 10U/μL (Invitrogen) RNaseH, 2 U/μL 0.5 0.013 U/μL (Invitrogen) Total 75 μL16° C., 2 hours T4 DNA polymerase, 5 1 0.067 U/μL U/μL (Invitrogen) 16°C., 5 minutes

A double stranded DNA, after column purification by a High Pure PCRCleanup Micro Kit (Roche), was incubated all night at 16° C. in thepresence of a P20EA/10EA adaptor (Table 3-1A) and T4 ligase in thefollowing T4 ligase buffer for a ligation reaction.

TABLE 3-1C Adaptor adding reaction Regebt Content (μL) Finalconcentration Complementary double 12.5 stranded DNA solution T4ligasebuffer 5 50 mM Tris-HCl, pH7.6, 10 mM MgCl₂, 1 mM ATP, 5% PEG5000, 1 mMDTT 50 μM P20EA/10EA 5 10 μM adaptor T4 DNA ligase, 1 U/μL 2.5 0.1 U/μL(Invitrogen) Total 25 16° C., all night

An adaptor added double stranded DNA purified similarly by a column asdiscussed above was digested by a NotI restriction enzyme (50 U/μL,Takara) with the following composition in order to remove an adaptoradded to the 3′ terminal.

TABLE 3-1D1 Restriction enzyme treatment Regent Content (μL) Finalconcentration complementary DNA 34 reaction solution 10× restrictionenzyme 5 50 mM Tris-HCl, buffer pH7.5, 10 mM MgCl₂, 1 mM, 1 mM DTT, 100mM NaCl

TABLE 3-1D2 0.1% BSA 5 0.01% 0.1% Triton X-100 5 0.01% NotI, 50 U/μL(Takara) 1 1 U/μL Total 50 37° C., 2 hours

3. PCR

The 1^(st) PCR amplification was performed by using a common adaptorprimer P20EA and a TCRα chain or β chain C region specific primer (CA1or CB1) from a double stranded complementary DNA. 20 cycles of PCR wereperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C. with the composition shown below.

TABLE 3-1E 1^(st) PCR amplification reaction composition Regent Content(μL) Final concentration 2× ExTaq Premix (Takara) 10 10 mM Tris-HCl(pH8.3) 50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mMP20EA primer 0.5 250 nM 10 mM CM1, CA1, CG1, CD1 or 0.5 250 nM CE1primer Double stranded complementary 2 DNA Sterilized water 7

A 1^(st) PCR amplicon was then used to perform 2^(nd) PCR with thereaction composition shown below by using a P20EA primer and a TCRαchain or β chain C region specific primer (CA2 or CB2). 20 cycles of PCRwere performed, where a cycle was 30 seconds at 95° C., 30 seconds at55° C., and one minute at 72° C.

TABLE 3-1F1 2^(nd) PCR amplification reaction composition Regent Content(μL) Final concentration 2× ExTaq Premix Takara) 10 10 mM Tris-HCl (pH8.3) 50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs

TABLE 3-1F2 0.5 U ExTaq polymerase 10 mM P20EA primer 1 500 nM 10 mM CM2(CA2, CG2, CD2 or 1 500 nM CE2)primer 1^(st) PCR amplicon 2 Sterilizedwater 6

A primer was removed from the obtained 2^(nd) PCR amplicon by a HighPure PCR Cleanup Micro Kit (Roche). Furthermore, analysis was carriedout with Roche's next generation sequence analyzer (GS Junior Bench Topsystem), with the 2^(nd) PCR amplicon diluted 10 fold as a template.Amplification utilized a B-P20EA primer, which is a P20EA adaptor primeradded with an adaptor B sequence, and HuVaF-01-HuVaF10 (α chain) andHuVbF-01-HuVbF-10 chain), which are a TCRα chain or β chain C regionspecific primer added with an adaptor A sequence and each MID Tagsequence (MID-1 to 26). 10 cycles of PCR were performed, where a cyclewas 30 seconds at 95° C., 30 seconds at 55° C., and one minute at 72° C.

TABLE 3-1G 3^(rd) PCR amplification reaction composition Content (μL)Final concentration 2× ExTaq Premix 10 10 mM Tris-HCl (pH 8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM B-P20EAprimer 1 500 nM 10 mM HuVaF or 1 500 nM HuVbF) primer 2^(nd) PCRamplicon 1 Sterilized water 7

In agarose gel electrophoresis, a band comprising about 600 bp ofamplicon was cut out, when visualized, and purified by using a DNApurification kit (QIAEX II Gel Extraction Kit, Qiagen). The amount ofDNA from the collected PCR amplicon was measured by using a Quant-T™PicoGreen® dsDNA Assay Kit (Invitrogen).

4. Next Generation Sequencing

Next generation sequencing was carried out with Roche's GS Juniorsequence analyzer. Specifically, a GS Junior Titanium emPCR Kit (Lib-L)was used to carry out emPCR in accordance with the protocol of themanufacturer at the ratio of beads to DNA (copy per beads:cpb) of 0.5.After emPCR, a sequence run was carried out for the beads collected withbeads enrichment by using sequence run reagents GS Junior TitaniumSequencing Kit and PicoTiterPlate Kit in accordance with the protocol ofthe manufacturer.

5. Data Analysis

The resulting sequence data (SFF file) was classified into readsequences for each MID Tag to create a sequence file in a Fasta formatby a software that comes with GS Junior (sfffile or sffinfo). Forsequence read analysis, V, D, J, and C sequences of each read sequencewere assigned by using V, D, J, and C sequences obtained from the IMGT(the international ImMunoGeneTics information system, www dot imgt dotorg) database as reference sequences. The newly developed software(Repertoire Genesis) was used for the assignment. 22,833 reads wereobtained and 16,407 reads (71.9%) were assigned for TCR.alpha. Thenumber of unique reads was 1705 reads. 121,080 reads were obtained and81,542 reads (67.3%) were assigned for TCR.beta. The number of uniquereads was 9,224. The frequency of the obtained reads was studied, with aread having the same TRAV gene, TRAJ gene and CDR3 sequence as a uniqueread (Table 3-1). Similarly, the frequency was studied for a read withthe same TRBV gene TRBJ gene, and CDR3 sequence (Table 3-2). As aresult, 1971 reads (12.53%) accounted for a read with TRAV10, TRAJ15 andCVVRATGTALIFG (SEQ ID NO: 1450) for a TRA repertoire, suggesting thepossibility that a cell expressing a specific TCR has increasedclonality. Further, 22568 reads (28.57%) accounted for reads withTRBV29-1, TRBJ2-7, and CSVERGGSLGEQYFG (SEQ ID NO: 1500) in a TRBrepertoire. The results suggest the possibility of monoclonal increasein T cells expressing a TCR molecule consisting of TCR.alpha. havingTRAV10 and TRAJ15 and TCR having TRBV29-1 and TRBJ2-7. Various diversityindices were compared among 10 healthy individuals and LGL patients(Table 3-3). The Shannon-Weaver's index (H′), Simpson's index (.lamda.),Inverse Simpson's index (1/.lamda.), and Pielou's index (J′) indicatingdiversity were exhibiting a lower value compared to healthy individuals,demonstrating reduced diversity.

6. Utility in Diagnosis

It is expected that a minor residual lesion can be detected with asequence read having TRAV10/TRAJ15/CVVRATGTALIFG (SEQ ID NO: 1450) orTRBV29-1/TRBJ2-7/CSVERGGSLGEQYFG (SEQ ID NO: 1500) as an indicator afterapplying therapy such as drug therapy in the LGL patients. Further, itis understood that a therapeutic effect on leukemia cells can bemeasured from quantitative analysis using read frequencies. Further, thepossibility of being able to predict the presence of a clonalityincreasing disease by using various diversity indices was suggested.

(Table 3-1) TRA read (top 50) (SEQ ID NOs: 1450-1499)

TABLE 3-1-1 SEQ ID NO: Rank TRAV TRAJ CDR3 Reads % Reads frame* 1450 1TRAV10 TRAJ CVVRATGTALIFG 1971 12.53 1451 2 TRAV13-2 TRAJ CAEGGTRAGEHLLL380 2.42 out 1452 3 TRAV12-1 TRAJ CVAYTGRRALTFG 337 2.14 1453 4 TRAV26-1TRAJ CIVLIQGAQKLVFG 328 2.08 1454 5 TRAV2 TRAJ CAVNMDSNYQLIWG 300 1.911455 6 TRAV2 TRAJ CAVETRGNTGKLIFG 293 1.86 1456 7 TRAV10 TRAJCVVSGPYNNDMRFG 277 1.76 1457 8 TRAV2 TRAJ CAVAVVDSSYKLIFG 272 1.73 14589 TRAV1-1 TRAJ CAGFYNQGGKLIFG 263 1.67 1459 10 TRAV2 TRAJCAVEDRAAGNKLTFG 263 1.67 1460 11 TRAV9-2 TRAJ CALSSTNSGYALNFG 255 1.621461 12 TRAV9-2 TRAJ CALRGNTNAGKSTFG 242 1.54 1462 13 TRAV10 TRAJCVVSARYSREEETNSP 241 1.53 out 1463 14 TRAV13-2 TRAJ CAENSKTSGSRLTFG 2361.50 1464 15 TRAV9-2 TRAJ CALSWDTGRRALTFG 234 1.49 1465 16 TRAV10 TRAJCVVTTVDDMRFG 223 1.42 1466 17 TRAV1-2 TRAJ CSLSGSARQLTFG 222 1.41 146718 TRAV12-3 TRAJ CAMRPFGNEKLTFG 219 1.39 1468 19 TRAV9-2 TRAJCALYGGATNKLIFG 218 1.39 1469 20 TRAV12-3 TRAJ CAMSEGNAGNMLTFG 216 1.371470 21 TRAV9-2 TRAJ CALSPLSGGSYIPTFG 206 1.31 1471 22 TRAV12-1 TRAJCVVRKYWRLQNYLW 181 1.15 out 1472 23 TRAV9-2 TRAJ CALRFLPGGGYNKLIF 1801.14 1473 24 TRAV9-2 TRAJ CALVPRRGATNKLIFG 176 1.12 1474 25 TRAV13-1TRAJ CAADDYKLSFG 172 1.09

TABLE 3-1-2 1475 26 TRAV35 TRAJ53 CANSGGSNYKLTFG 166 1.06 1476 27 TRAV6TRAJ27 CALDGNACKSTFG 165 1.05 1477 28 TRAV13-1 TRAJ39 CAASSSGGNMLTFG 1621.03 1478 29 TRAV9-2 TRAJ13 CALIGGYQKVTFG 162 1.03 1479 30 TRAV9-2TRAJ43 CALRVTCAL 162 1.03 out 1480 31 TRAV21 TRAJ9 CAVVEGTGGFKTIFG 1581.00 1481 32 TRAV9-2 TRAJ21 CALGLMGNFNKFYFG 155 0.99 1482 33 TRAV2 TRAJ9CAVDRNTGGFKTIFG 153 0.97 1483 34 TRAV13-1 TRAJ36 CAASRGANNLFFG 147 0.931484 35 TRAV9-2 TRAJ41 CALRPNSNSGYALNFG 146 0.93 1485 36 TRAV9-2 TRAJ23CALNYNQGGKLIFG 144 0.92 1486 37 TRAV9-2 TRAJ43 CAGNNNDMRFG 139 0.88 148738 TRAV12-2 TRAJ20 CAVSNDYKLSFG 135 0.86 1488 39 TRAV12-2 TRAJ48CAVPFGNEKLTFG 134 0.85 1489 40 TRAV6 TRAJ22 CALGASGSARQLTFG 128 0.811490 41 TRAV9-2 TRAJ39 CAL SDRAGNMLTFG 127 0.81 1491 42 TRAV20 TRAJ44CATHTGTASKLTFG 124 0.79 1492 43 TRAV9-2 TRAJ6 CALPPSGGSYIPTFG 124 0.791493 44 TRAV10 TRAJ27 CVVSPLTNAGKSTFG 118 0.75 1494 45 TRAV9-2 TRAJ27CATRRGTPMQANQPL 117 0.74 out 1495 46 TRAV2 TRAJ40 CAVETSYSGTYKYIFG 1160.74 1496 47 TRAV26-1 TRAJ44 CIVRSHTTGTASKLTF 114 0.72 1497 48 TRAV9-2TRAJ8 CASLFQKLVFG 105 0.67 out 1498 49 TRAV38-2 TRAJ54 CAYRSENSGSPEAGIW104 0.66 1499 50 TRAV21 TRAJ33 CAVTFGDSNYQLIWG 103 0.65 *out:out-of-frame

Table 3-2 TRB Read (Top 50)

TABLE 3-2-1 SEQ ID NO: Rank TRAV TRAJ CDR3 Reads % Reads frame* 1500 1TRBV29-1 TRBJ2-7 CSVERGGSLGEQYFG 22568 28.53 1501 2 TRBV20-1 TRBJ2-7CSARTLAGHYEQYFG 5609 7.10 1502 3 TRBV7-9 TRBJ2-7 CASSYPGTGNHEQYFG 8091.02 1503 4 TRBV29-1 TRBJ2-7 CSVERGGSLGGAVLR 770 0.97 out

TABLE 3-2-2 1504 5 TRBV29-1 TRBJ2-5 CSANPGQQLQETQYFG 573 0.73 1505 6TRBV29-1 TRBJ2-7 CSVEREAPLGSSTS 571 0.72 out 1506 7 TRBV29-1 TRBJ2-7CSVERGGSLGEQTS 542 0.69 out 1507 8 TRBV29-1 TRBJ2-7 CSVEERKGEQYFG 5140.65 1508 9 TRBV15 TRBJ CATSRDGQQETQYFG 510 0.65 1509 10 TRBV29-1TRBJ2-7 CSARTGDYYEQYFG 486 0.62 1510 11 TRBV7-2 TRBJ CASSLAGGSYNEQFFG465 0.59 1511 12 TRBV29-1 TRBJ2-7 CSVSETGIYEQYFG 460 0.58 1512 13TRBV20-1 TRBJ2-7 CSASRGLAGGSYEQYF 446 0.56 1513 14 TRBV20-1 TRBJ2-7CSARTXRDIQQYFG 435 0.55 out 1514 15 TRBV29-1 TRBJ2-3 CSALAGVGDTQYFG 4270.54 out 1515 16 TRBV29-1 TRBJ2-7 CSVERGRLPWGSSTS 426 0.54 1516 17TRBV29-1 TRBJ2-1 CSVEVLAGGPNEQFFG 425 0.54 1517 18 TRBV29-1 TRBJ2-1CSVTGTSGRATTSPSSYNEQFFG 424 0.54 1518 19 TRBV29-1 TRBJ2-6CSVATGGDGANVLTFG 384 0.49 1519 20 TRBV29-1 TRBJ2-7 CSVGGLRDRPSYEQYFG 3810.48 1520 21 TRBV29-1 TRBJ2-3 CSQIEGDTQYFG 378 0.48 1521 22 TRBV12-3TRBJ2-7 CASSQTVYEQYFG 371 0.47 1522 23 TRBV29-1 TRBJ2-7 CSAVEARKSSYEQYFG357 0.45 1523 24 TRBV29-1 TRBJ2-3 CSVGAGAGGTDTQYFG 343 0.43 1524 25TRBV29-1 TRBJ2-7 CSVERGRLPWGAVLR 312 0.39 out 1525 26 TRBV6-6 TRBJ2-5CASTSSETQYFG 311 0.39 1526 27 TRBV29-1 TRBJ2-7 CSVERGGSLGSSTS 300 0.38out 1527 28 TRBV10-3 TRBJ1-6 CAISETPTSNNPHSSYNSPLHFG 280 0.35 1528 29TRBV28 TRBJ1-1 CASMVGPANIEAFFG 277 0.35 1529 30 TRBV4-1 TRBJ2-7CASSQYLISYEQYFG 275 0.35 1530 31 TRBV29-1 TRBJ2-1 CSVGGVSSYNEQFFG 2740.35 1531 32 TRBV15 TRBJ1-5 CATSTQKNQPQHFG 270 0.34 1532 33 TRBV29-1TRBJ2-5 CSVEAGVGETQYFG 270 0.34 1533 34 TRBV20-1 TRBJ1-2 CSAREGVGYGYTFG260 0.33 1534 35 TRBV19 TRBJ2-1 CASRHIAGEVNEQFFG 254 0.32 1535 36 TRBV15TRBJ1-1 CATSRDRQSDTEAFFG 250 0.32 1536 37 TRBV20-1 TRBJ1-1 CSARDQTAEAFFG244 0.31 1537 38 TRBV9 TRBJ1-2 CASSVAARPYGYTFG 244 0.31 1538 39 TRBV25-1TRBJ1-3 CASSEVREALETPYIL 241 0.31 out

TABLE 3-2-3 1539 40 TRBV28 TRBJ1-5 CASTQNYAQPQHFG 238 0.30 1540 41TRBV15 TRBJ2-1 CATSGQGKTYNEQFFG 237 0.30 out 1541 42 TRBV5-1 TRBJ1-4CASSYTGTGDENCFW 236 0.30 1542 43 TRBV20-1 TRBJ1-1 CSPDEAFFG 226 0.291543 44 TRBV14 TRBJ1-3 CASSQDFRSVSGNTIYFG 225 0.28 1544 45 TRBV7-3TRBJ2-3 CASSLAGGVDTQYFG 221 0.28 1545 46 TRBV27 TRBJ2-7 CASSGTSGRYEQYFG220 0.28 1546 47 TRBV29-1 TRBJ2-1 CSVVSWQVLDKSSSS 214 0.27 out 1547 48TRBV4-1 TRBJ2-3 CASSRPGQGLTQYFG 207 0.26 out 1548 49 TRBV29-1 TRBJ2-7CSVERGGSLGXQYFG 206 0.26 1549 50 TRBV4-3 TRBJ2-7 CASSQERGKYEQYFG 2020.26 *out: out-of-frame

Table 3-3 Diversity Index

TABLE 3-3 TCR α TCR β Diversity Healthy individuals (n = 10) Healthyindividuals (n = 10) index LGL Mean Maximum Minimum LGL Mean MaximumMinimum H′ 5.0 7.5 6.3 8.4 5.3 7.1 6.0 8.0 1-λ 0.976 0.997 0.990 1.0000.912 0.996 0.983 1.000 1/λ 41.1 975.1 103.2 4042.0 11.4 883.8 58.72711.5 J 0.7 0.9 0.9 1.0 0.6 0.9 0.9 1.0 H′: Shannon-Wiener′s diversityindex, λ: Simpson′s diversity index, 1/λ: Inverse Simpson′s diversityindex, J: Pielou′s evenness index

FIG. 44 shows a distribution of the number of unique reads in TCRα andTCRβ chain repertoire analysis. The distribution was examined for uniquereads (base sequence without commonality with other reads) of allsequence reads, with number of copies in the horizontal axis. A readthat was only detected once (single) was 73.3% (1250 reads) of the wholefor TCRα, and 70.5% (6502 reads) for a TCRβ chain.

FIG. 45 shows TRAV and TRAJ repertoires. The usage frequency of each ofTRAV and TRAJ in all reads is shown. The horizontal axis indicates TRAVgenes (top graph) and TRAJ genes (bottom graph). The vertical axisindicates the percentage (% Usage) accounted for among all reads.

FIG. 46 shows a 3D plot of a TRA repertoire. Usage frequency of eachcombination of TRAV and TRAJ in all reads is shown in athree-dimensional plot. The horizontal axis indicates a TRAJ gene, thedepth axis indicates a TRAV gene, and the vertical axis indicates usagefrequency (% Usage). The combination of TRAV10 and TRAJ15 exhibited thehighest usage frequency (12.53%). FIG. 47 shows TRBV and TRBJrepertoires. The usage frequency of each of TRBV and TRBJ in all readsis shown. The horizontal axis indicates TRBV genes (top graph) and TRBJgenes (bottom graph). The vertical axis indicates the percentage (%Usage) accounted for among all reads.

FIG. 48 shows a 3D plot of a TRB repertoire. Usage frequency of eachcombination of TRBV and TRBJ in all reads is shown in athree-dimensional plot. The horizontal axis indicates a TRBV gene, thedepth axis indicates a TRBJ gene, and the vertical axis indicates usagefrequency (% Usage). The combination of TRBV29-1 and TRBJ2-7 exhibitedthe highest usage frequency (28.57%).

(Example 2 of analysis system: Analysis of T cells infiltrating largeintestine tissue of HLA-A2402 colorectal cancer patients) The presentExample analyzed T cells infiltrating large intestine tissue ofHLA-A2402 colorectal cancer patients by using the analysis system of thepresent invention.

(Materials and Methods)

Sample: Tumor tissue of colorectal cancer patients extracted by surgicaloperation, peripheral blood of healthy individuals Method

(Collection and Storage of Colorectal Cancer Tissue)

Tumor tissue was collected by a tumor extraction surgery in 60 largeintestine patients. 100 mg of tissue corresponding to the size of a soybean was collected from a cancer lesion of the extracted organ. Thetissue was cut into a square with a 5 mm side and immediately immersedin an RNA stabilizing agent (RNAlater®, Ambion). After storing at 4°overnight, RNAlater® was removed and then the tissue was stored at −80°C.

(Isolation of Peripheral Blood of Healthy Individuals)

As a control, peripheral blood cells of a healthy individual were used.5 mL of whole blood was collected from 10 healthy individuals in aheparin-containing blood collection tube. Peripheral blood mononuclearcells (PBMC) were separated by ficoll density gradient centrifugation.Total RNA was extracted/purified from the isolated 5×10⁶ PBMCs by usingan RNeasy Lipid Tissue Mini Kit (QIAGEN, Germany). The resulting RNA wasquantified by absorbance of A260 by using an absorption spectrometer(Table 3-3A). Table 3-3A Amount of total RNA in peripheral blood cellsof healthy individuals

TABLE 3-3A1 Sample Number Amount of elution (μL) RNA concentration(ng/μL) 1 30 1682 2 30 274 3 30 1007 4 30 560 5 30 988 6 30 1327

TABLE 3-3A2 7 30 667 8 30 258 9 30 597 10 30 624

(Examination of HLA Haplotype)

HLA-A typing was carried out in order to identify the expression of HLAand HLA haplotype in cancer tissue. A part of cancer tissue immersed inRNA Later® was taken out, and a genomic DNA was extracted by using aQIAampDNA Mini Kit (Qiagen, Germany). The DNA was then amplified andlabeled by using a WAKFlow HLA typing reagent HLA-A (Wakunaga) andanalyzed with Luminex (Luminex Corp.) As a result, HLA-A2402 gene wasexpressed homo or hetero in 25 specimens out of 60 specimens (Tables3-4).

Table 3-4 List of Colorectal Cancer Tissue Expressing HLA-A2402

TABLE 3-4-1 Number Sex Age Diagnosis Site of metastasis Stage HLA-A 1 F70 Progression in rectal cancer 1 A*24:02 A*33:03 2 F 30 Progressionrectal tumor (Rs) Lung 3b A*24:02 A*11:01 5 F 80 Rectal cancer (Rs) 2A*24:02 — 8 F 60 Sigmoid colon cancer (progressing) 2 A*24:02 A*11:01 12M 64 Sigmoid colon cancer (progressing) 3a A*24:02 — 13 F 75 Sigmoidcolon cancer (progressing) 1 A*24:02 A*33:03 14 F 68 Cecum cancer 1A*24:02 — 16 F 48 Ascending colon cancer Liver, lymph 4 A*24:02 A*02:0625 F 60 Cecum cancer 3a A*24:02 — 27 M 79 Rectal cancer (Rs) 2 A*24:02A*31:01 28 M 60 Rectal cancer (Rs) Peritoneum 3b A*24:02 A*11:01 29 F 52Sigmoid colon canver Ovary 4 A*24:02 A*02:06 30 M 58 Rectal cancer 3bA*24:02 A*02:01 31 F 72 Sigmoid colon cancer (progressing) 3b A*24:02A*31:01 32 M 79 Sigmoid colon cancer 2 A*24:02 — 34 M 63 Descendingcolon cancer 2 A*24:02 — 35 F 77 Sigmoid colon-cancer (progressing) 3aA*24:02 A*31:01 38 M 74 Ascending colon cancer (progressing) 2 A*24:02A*33:03 39 F 74 Rectal cancer (progressing) Rs 3a A*24:02 A*26:03 41 F74 Rectal cancer (early) Rs 1 A*24:02 A*11:01

TABLE 3-4-2 42 F 67 Ascending colon cancer 3a A*24:02 A*26:03(progressing) 44 M 73 Ascending colon cancer 2 A*24:02 A*26:01(progressing) 54 M 70 Colon cancer (progressing) Rs 3a A*24:02 A*02:0158 M 72 Sigmoid colon cancer (early) 2 A*24:02 — 59 M 70 Sigmoid coloncancer 2 A*24:02 — (progressing)

(RNA Extraction and Measurement of Amount of RNA)

In order to analyze a TCR repertoire in 25 specimens expressing aHLA-A2402 gene, a portion of tissue immersed in RNAlater® was taken out,and total RNA was extracted/purified by using an RNeasyLipidTissue MiniKit (QIAGEN, Germany). Elution from a column was carried out with 50 μLof RNAase free-sterilized water. The amount of RNA obtained from eachsample is shown in Table 3-5.

Table 3-5 Amount of Total RNA of Colorectal Cancer Sample

TABLE 3-5-1 Number Sample number Amount of RNA (ng/uL) 1 HGS01 3765 2HGS02 2570 5 HGS03 3603 8 HGS04 3007 12 HGS05 4843 13 HGS06 1382 14HGS07 4577 16 HGS08 2656 25 HGS09 4219 27 HGS10 6053 28 HGS11 2541 29HGS12 2516 30 HGS13 4319 31 HGS14 4126 32 HGS15 5039 34 HGS16 3624 35HGS17 4459 38 HGS18 4561 39 HGS19 4088 41 HGS20 2042 42 HGS21 3554

TABLE 3-5-2 44 HGS23 3851 54 HGS28 1089 58 HGS29 2659 59 HGS30 2981

(Synthesis of Complementary DNA and Double Stranded Complementary DNA)

The extracted RNA sample was used to carry out an adaptor-ligation PCR.First, in order to synthesize a complementary DNA, a BSL-18E primer and3.5 μL of RNA were admixed and annealed for 8 minutes at 70° C. Aftercooling on ice, a reverse transcription reaction was performed in thepresence of an RNase inhibitor (RNAsin) to synthesize a complementaryDNA with the following composition.

TABLE 3-1H Synthesis of complementary DNA Reagent Content (μL) Finalconcentration RNA solution 3.5 200 μM BSL-18E 1.5 30 μM Total 5 70° C.,8 minutes 5× First strand buffer 2 50 mM Tris-HCl, pH8.3, 75 mM KCl, 3mM MgCl₂ 0.1M DTT 1 10 mM 10 mM dNTPs 0.5 500 μM RNasin (Promega) 0.5 2U/μL Superscript III ™, 200 1 20 U/μL U/μL (Invitrogen)

The complementary DNA was subsequently incubated for hours at 16° C. inthe following double-stranded DNA synthesis buffer in the presence of E.coli DNA polymerase I, E. coli DNA Ligase, and RNase H to synthesize adouble stranded complementary DNA. Furthermore, T4 DNA polymerase wasreacted for 5 minutes at 16° C. to perform a 5′ terminal bluntingreaction.

TABLE 3-1I1 Synthesis of complementary double stranded DNA ReagentContent (μL) Final concentration Complementary DNA 9 reaction solutionSterilized water 46.5 5× Second strand buffer 15 25 mM Tris-HCl, pH7.5,100 mM KCl, 5 mM MgCl₂,

TABLE 3-1I2 10 mM (NH₄)SO₄, 0.15 mM β · NAD+, 1.2 mM DTT 10 mM dNTPs 1.50.2 mM E. coli DNA lignase, 10 0.5 0.067 U/μL U/μL (Invitrogen) E. coliDNA polymerase, 2 0.27 U/μL 10 U/μL (Invitrogen) RNaseH, 2 U/μL 0.50.013 U/μL (Invitrogen) Total 16° C. 75 μL 2 hours T4 DNA polymerase, 10.067 U/μL 5 U/μL (Invitrogen) 16° C. 5 minutes

A double stranded DNA, after column purification by a High Pure PCRCleanup Micro Kit (Roche), was incubated all night at 16° C. in thepresence of a P20EA/10EA adaptor and T4 ligase in the following T4ligase buffer for a ligation reaction.

TABLE 3-1J Adaptor adding reaction Reagent Content (μL) Finalconcentration Complementary double 12.5 stranded DNA solution T4 ligasebuffer 5 50 mM Tris-HCl, pH7.5, 10 mM MgCl₂, 1 mM, ATP, 5% PEG5000, 1 mMDTT 50 μM P20EA/10EA 5 10 μM adaptor T4 DNA polymerase, 2.5 0.1 U/μL 1U/μL (Invitrogen) Total 16° C., 25 all night

An adaptor added double stranded DNA purified by a column as discussedabove was digested by a NotI restriction enzyme (50 U/μL, Takara) withthe following composition in order to remove an adaptor added to the 3′terminal.

TABLE 3-1K1 Restriction enzyme treatment Reagent Content (μL) Finalconcentration

indicates data missing or illegible when filed

TABLE 3-1K2 Complementary double 34 stranded DNA solution 10Xrestriction enzyme 0 50 mM Tris-HCl, pH7.5, 10 buffer mM MgCl₂, 1 mM, 1mM DTT, 100 mM NaCl 0.1% BSA 5 0.01% 0.1% Triton X-100 5 0.01% NotI, 50U/μL (Takara) 1 1 U/μL Total 50 37° C., 2 hours

5. PCR

The 1^(st) PCR amplification was performed by using a common adaptorprimer P20EA and a TCRα chain or β chain C region specific primer (CB1)from a double stranded complementary DNA from a double strandedcomplementary DNA. 20 cycles of PCR were performed, where a cycle was 30seconds at 95° C., 30 seconds at 55° C., and one minute at 72° C. withthe composition shown below.

TABLE 3-1L 1^(st) PCH amplification reaction composition Content (μL)Final concentration 2x ExTag Premix 10 10 mM Tris-HCl (pH8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTag polymerase 10 mM P20EA 0.5250 nM primer 10 mM CB1 primer 0.5 260 nM Double stranded 2complementary DNA Sterilized water 7

A 1^(st) PCR amplicon was then used to perform nested PCR with thereaction composition shown below between a P20EA primer and each of theimmunoglobulin isotype C region specific primers. 20 cycles of PCR wereperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C.

TABLE 3-1M1 2^(nd) PCE amplification reaction composition Content (μL)Final concentration 2x ExTaq Premix 10 10 mM Tris-HCl (pH8.3) (Takara)50 mM KCl 2 mM MgCl₂

TABLE 3-1M2 0.2 mM dNTPs 0.5 LT ExTaq polymerase 10 mM P20EA 1 500 nMprimer 10 mM CB2 primer 1 500 nM 1^(st) PCR amplicon 2 Sterilized water6

A primer was removed from the obtained 2^(nd) PCR amplicon by a HighPure PCR Cleanup Micro Kit (Roche). Furthermore, analysis was carriedout with Roche's next generation sequence analyzer (GS Junior Bench Topsystem), with the 2^(nd) PCR amplicon diluted 10 fold as a template.Amplification utilized a B-P20EA primer, which is a P20EA adaptor primeradded with an adaptor B sequence, and HuVaF- and HuVbF primers, whichare TCRα chain and β chain C region specific primers added with anadaptor A sequence and each MID Tag sequence. 10 cycles of PCR wereperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C.

TABLE 3-1N 3^(rd) amplification reaction composition Content (μL) Finalconcentration 2x ExTaq Premix 10 10 mM Tris-HCl (pH8.3) (Takara) 50 nMKCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM B-P20EA 1 500nM primer Each of 10 mM HuVbF 1 500 nM and HuVaF primers 2^(nd) PCRamplicon 1 Sterilized water 7

6. Next Generation Sequencing

Next generation sequencing was carried out by Roche's GS Junior sequenceanalyzer. Specifically, a GS Junior Titanium emPCR Kit (Lib-L) was usedto carry out emPCR in accordance with the protocol of the manufacturerat the ratio of beads to DNA (copy per beads:cpb) of 2. After emPCR, asequence run was carried out for the beads collected with beadsenrichment by using sequence run reagents GS Junior Titanium SequencingKit and PicoTiterPlate Kit in accordance with the protocol of themanufacturer.

7. Data Analysis

The resulting sequence data (SFF file) was classified into readsequences for each MID Tag to create a sequence file in a Fasta formatby a software that comes with GS Junior (sfffile or sffinfo). Arepertoire analysis software (Repertoire Genesis) was used for collationwith reference sequences in the IMGT database to assign an AV region, BVregion, AJ region, and BJ region of each read and determine the CDR3sequence.

8. Extraction of Overlapping Unique Reads in Analysis of 10 HealthyIndividuals

As a normal control, TCR sequences of peripheral blood mononuclear cellsof 10 healthy individuals were examined. An overlapping read wassearched among individuals by using V, J and CDR3 sequences as anindicator for TCRα and TCRβ sequence reads obtained from each healthyindividual. The number of overlapping unique reads and the number ofindividuals with such overlapping unique reads were examined betweenTCRα and TCRβ chains (Table 3-6). Relative to TCR chains, the number ofoverlapping unique reads was significantly more in TCRα chains (809 vs.39) and the ratio thereof was also higher (2.37% vs. 0.19%). Further,there were overlapping reads in out of 10 individuals at the maximum forTCRα chains, while all overlapping reads for TCR chains were overlappingin only 2 individuals. The results suggest that there is more similarityamong individuals in a TCRα repertoire.

Table 3-6 Number of Overlapping Unique Reads in Healthy Individuals

TABLE 3-6 Number of Number of individuals overlapping unique reads(Total of 10 cases) TCR α TCR β 2 684 39 3 87 0 4 18 0 5 10 0 6 4 0 7 40 8 2 0 9 0 0 10 0 0 Total overlapping 809 39 reads (percentage) (2.37%)(0.19%)

9. Analysis of Overlapping Reads in TCRα Chain

The base sequence of an overlapping read was examined in detail for aTCRα chain exhibiting a level of overlap among individuals that is highrelative to a TCRβ, chain. As a result, it was discovered that many ofthe TCR reads exhibiting a high level of overlap are TCRα genes derivedfrom a mucosal-associated invariant T cell (MAIT) or a natural killer Tcell (NKT) known as expressing an invariant chain (Table 3-7). NKT cellsmainly express TRAV10 (Vα24)-TRAJ18 and MAITs mainly express a TCRconsisting of TRAV1-2 (Vα7.2)-TRAJ33. It has been reported recently thata TCR of a MAIT recognizes a vitamin B metabolite of bacteria presentedby an MR1 molecule and has drawn attention for the role inimmunosurveillance function (Nature. 2012 Nov. 29; 491 (7426): 717-23;JExp Med. 2013 Oct. 21; 210(11): 2305-20). When an overlapping read witha number of overlapping individuals of 4 or more was collated with analready reported invariant TCR, it was found that 45% thereof wasaccounted for by invariant TCRs (Table 3-7). In contrast to TCRα chainswhere there is a highly frequent overlapping read, the number ofoverlapping individuals for a TCRβ, chain was at maximum 2 (Table 3-8).Thus, the high level of overlap in TCRα is estimated to be due to thepresence of an invariant TCR. 21 types of TCRα reads, which could not becollated with an already reported invariant TCR, were identified among38 types of highly frequently overlapping reads that overlap in 4 ormore individuals (Table 3-9). The possibility is suggested or them beingnovel invariant TCRs.

TABLE 3-7 Overlapping TCRα chain read sequences in healthy individualsSEQ ID % Read NO: Number n TRAV TRAJ CDR3 (Mean) Cell 1550 1 8 TRAV1-2TRAJ33 CAVMDSNYGLIWG 0.24 MAIT 1551 2 8 TRAV1-2 TRAJ20 CAVRDGDYKLSFG0.08 MAIT 1552 3 7 TRAV9-2 TRAJ53 CALSGGSNYKLIFG 0.04 1553 4 7 TRAV13-2TRAJ9 CAENTGGFKTIFG 0.03 1554 5 7 TRAV1-2 TRAJ33 CAVRDSNYQLIWG 0.23 MAIT1555 6 7 TRAV1-2 TRAJ12 CAVMDSSYKLIFG 0.12 MAIT 1556 7 6 TRAV1-2 TRAJ33CAVTDSNYQLIWG 0.05 MAIT 1557 8 6 TRAV1-2 TRAJ33 CAVLDSNYQLIWG 0.11 MAIT1558 9 6 TRAV1-2 TRAJ20 CAVRDRDYKLSFG 0.06 MAIT 1559 10 6 TRAV10 TRAJ18CVVSDRGSTLGRLYFG 0.38 NKT 1560 11 5 TRAV9-2 TRAJ23 CALIYNQGGKLIFG 0.021561 12 5 TRAV9-2 TRAJ20 CALNDYKLSFG 0.05 1562 13 5 TRAV13-2 TRAJ53CAENSGGSNYKLTFG 0.04

TABLE 3-7-2 1563 14 5 TRAV13-2 TRAJ39 CAENNAGNMLTFG 0.04 1564 15 5TRAV12-2 TRAJ8 CAVNTGFQKLVFG 0.03 1565 16 5 TRAV12-2 TRAJ20 CAVNDYKLSFG0.02 1566 17 5 TRAV12-1 TRAJ31 CVVNNARLMFG 0.02 1567 18 5 TRAV1-2 TRAJ33CAVKDSNYQLIWG 0.13 MAIT 1568 19 5 TRAV1-2 TRAJ33 CAAMDSNYCLIWG 0.08 MAIT1569 20 5 TRAV1-2 TRAJ33 CAALDSNYQLIWG 0.04 MAIT 1570 21 4 TRAV9-2 TRAJ6CALSGGSYIPTFG 0.06 1571 22 4 TRAV9-2 TRAJ42 CALSDYGGSQGNLIFG 0.05 157223 4 TRAV9-2 TRAJ35 CALIGFGNVLHCG 0.02 1573 24 4 TRAV2 TRAJ9CAVEEGTGGFKTIFG 0.02 1574 25 4 TRAV13-2 TRAJ44 CAENTGTASKLTFG 0.02 157526 4 1RAV13-1 TRAJ53 CAASGGSNYKLTFG 0.03 1576 27 4 TRAV12-2 TRAJ6CAVSGGSYIPTFG 0.04 1577 28 4 TRAV 2-2 TRAJ30 CAVNRDDKIIFG 0.04 1578 29 4TRAV12-2 TRAJI5 CAVNGAGTALIFG 0.05 1579 30 4 TRAV 2-2 TRAJ15CAVNNQAGTALIFG 0.02 1580 31 4 TRAV12-1 TRAJ49 CVVNTGNQFYFG 0.03 1581 324 TRAV12-1 TRAJ15 CVVNQAGTALIFG 0.12 1582 33 4 TRAV1-2 TRAJ33CAVVDSNYQLIWG 0.05 MAIT 1583 34 4 TRAV1-2 TRAJ33 CAVSDSNYQLIWG 0.07 MAIT1584 35 4 TRAV1-2 TRAJ33 CASMDSNYCLIWG 0.03 MAIT 1585 36 4 TRAV1-2TRAJ33 CAPMDSNYQLIWG 0,05 MA1T 1586 37 4 TRAV1-2 TRAJ12 CAVRDSSYKLIFG0.03 MAIT 1587 38 4 TRAV1-2 TRAJI2 CAVLDSSYKLIFG 0.02 MA1T MAIT:Mucosal-associated invariant T cells, NKT: Natural killer T cells

Table 3-8 Overlapping TCR Chain Read Sequences in Healthy Individuals

TABLE 3-8-1 SEQ ID % Read NO: Number n TRBV TRBJ CDR3 (Mean) 1588 1 2TRBV9 TRBJ2-5 CASSVLRGGDPSTS 0.051 1589 2 2 TRBV9 TRBJ2-5CASSGEVGRETQYFG 0.051 1590 3 2 TRBV9 TRBJ2-2 CASSYWGTGELFFG 0.051 1591 42 TRBV9 TRBJ2-2 CASSYWGPGSCFL 0.051

TABLE 3-8-2 1592 5 2 TRBV9 TRBJ1-5 XASSVGTKDQPQHFG 0.051 1593 6 2 TRBV9TRBJ1-2 CASXVATWTGTATPS 0.051 1594 7 2 TRBV9 TRBJ1-2 CASSVATLDRDGYTFG0.051 1595 8 2 TRBV7-9 TRBJ2-7 CASTQSTGLRVLLR 0.051 1596 9 2 TRBV7-9TRBJ2-7 CASSYRAATYEQYFG 0.051 1597 10 2 TRBV7-9 TRBJ2-7CASSLGLAGARYEQYFG 0.051 1598 11 2 TRBV7-9 TRBJ2-7 CASSFRDSYEQYFG 0.0511599 12 2 TRBV7-9 TRBJ2-6 CASTGAPGANVLTFG 0.051 1600 13 2 TRBV7-9TRBJ2-6 CASSFPVPSGANVLTFG 0.051 1601 14 2 TRBV7-9 TRBJ2-6CASRAEHGSRRRGQRPDFR 0.051 1602 15 2 TRBV7-9 TRBJ2-5 CASSLVGETQYFG 0.0291603 16 2 TRBV7-9 TRBJ2-1 CASSLGTSGSRNEQFFG 0.041 1604 17 2 TRBV6-4TRBJ2-3 CASSDSTTDTQYFG 0.030 1605 18 2 TRBV6-4 TRBJ2-3 CASSDGTGGTDTQYFG0.039 1606 19 2 TRBV4-1 TRBJ2-5 CASSQGQGETQYFG 0.057 1607 20 2 TRBV29-1TRBJ2-7 CSVGQGPNEQYFG 0.044 1608 21 2 TRBV29-1 TRBJ2-7 CSVAGTGAYEQYFG0.043 1609 22 2 TRBV29-1 TRBJ2-7 CSVAASYEQYFG 0.030 1610 23 2 TRBV29-1TRBJ2-5 CSVERETQYFG 0.070 1611 24 2 TRBV29-1 TRBJ2-5 CSVDSKETQYFG 0.0671612 25 2 TRBV29-1 TRBJ2-3 CSVEGSTDTQYFG 0.030 1613 26 2 TRBV29-1TRBJ2-3 CSVEEGTDTQYFG 0.073 1614 27 2 TRBV29-1 TRBJ2-2 CSVVGTGELFFG0.040 1615 28 2 TRBV29-1 TRBJ2-1 CSVAGTSGYNEQFFG 0.030 1616 29 2TRBV29-1 TRBJ1-3 CSVGTGNTIYFG 0.058 1617 30 2 TRBV29-1 TRBJ1-2CSVRGGNYGYTFG 0.041 1618 31 2 TRBV29-1 TRBJ1-2 CSVGSGSYGYTFG 0.042 161932 2 1R8V28 TRBJ2-7 CASSPSYEQYFG 0.030 1620 33 2 TRBV28 TRBJ2-5CASSLRGQETQYFG 0.028 1621 34 2 TRBV28 TRBJ2-5 CASSLRETQYFG 0.045 1622 352 TRBV28 TRBJ2-2 CASSLLTGELFFG 0.029 1623 36 2 TRBV20-1 TRBJ2-7CSASGTSVSYEQYFG 0.072 1624 37 2 TRBV2 TRBJ2-1 CASSDNEQFFG 0.038 1625 382 TRBV15 TRBJ2-5 CATSRDLGETQYFG 0.044 1626 39 2 TRBV12-3 TRBJ1-1CASSLAGNTEAFFG 0.059

Table 3-9 Invariant TCR Candidate Genes

TABLE 3-9 SEQ ID NO: Number TRAV TRAJ CDR3 1627 1 TRAV9-2 TRAJ53CALSGGSNYKLTFG 1628 2 TRAV13-2 TRAJ9 CAENTGGFKTIFG 1629 3 TRAV9-2 TRAJ23CALIYNQGGKLIFG 1630 4 TRAV9-2 TRAJ20 CALNDYKLSFG 1631 5 TRAV13-2 TRAJ53CAENSGGSNYKLTFG 1632 6 TRAV13-2 TRAJ39 CAENNAGNMLTFG 1633 7 TRAV12-2TRAJ8 CAVNTGFQKLVFG 1634 8 TRAV12-2 TRAJ20 CAVNDYKLSFG 1635 9 TRAV12-1TRAJ31 CVVNNARLMFG 1636 10 TRAV9-2 TRAJ6 CALSGGSYIPTFG 1637 11 TRAV9-2TRAJ42 CALSDYGGSQGNLIFG 1638 12 TRAV9-2 TRAJ35 CALIGFGNVLHCG 1639 13TRAV2 TRAJ9 CAVEEGTGGFKTIFG 1640 14 TRAV13-2 TRAJ44 CAENTGTASKLTFG 164115 TRAV13-1 TRAJ53 CAASGGSNYKLTFG 1642 16 TRAV12-2 TRAJ6 CAVSGGSYIPTFG1643 17 1RAV12-2 TRAJ30 CAVNRDDKIIFG 1644 18 TRAV12-2 TRAJ15CAVNQAGTALIFG 1645 19 TRAV12-2 TRAJ15 CAVNNQAGTALIFG 1646 20 TRAV12-1TRAJ49 CVVNTGNQFYFG 1647 21 TRAV12-1 TRAJ15 CVVNQAGTALIFG

10. Analysis of Overlapping Read in Colorectal Cancer Patient Tissue

It is known that a cancer antigen specific T cells is present in cancertissue of a cancer patient and has an important role in an antitumoreffect. In order to identify a cancer antigen specific TCR gene, a TCRrepertoire is analyzed while targeting a patient with a specific HLA toidentify a TCR gene that grows in response to a specific antigen. Inthis experiment, cancer tissue of 25 colorectal cancer patients having acommon HLA-A2402 was used to carry out TCR repertoire analysis to searchfor a unique read that is present while overlapping among cancer patientsamples (Table 3-10). As a result, it was found that 213 reads (1.65%)were present while overlapping in a plurality of patients for a TCRαchain and 49 reads (0.11%) for a TCRβ chain. As in healthy individuals,there is a highly frequently overlapping read in maximum of 12 out of 25cases for a TCRα chain, while only a maximum of 2 for a TCRβ chain. Fora TCRα chain, there is a common read among a maximum of 12 specimens,and sequences of 7 reads overlapping in cancer tissue of 4 or morespecimens were TCRα chains having TRAV1-2/TRAJ33 derived from MAITsexcept for one case (Table 3-11). (Table 3-10 Number of cancer specificreads and number of overlapping unique reads in cancer tissue)

TABLE 3-10 TCR α TCR β Number of Number Number individuals Number of ofcancer Number of of cancer (total overlapping specific overlappingspecific of 25 cases) reads reads reads reads 2 192 150 47 46 3 14 7 2 24 2 0 0 0 5 1 0 0 0 6 2 0 0 0 7 0 0 0 0 8 0 0 0 0 9 0 0 0 0 10 1 0 0 011 0 0 0 0 12 1 0 0 0 Total 213 157 49 48 overlapping (1.65%) (1.22%)(0.11%) (0.11%) reads (percentage)

Table 3-11 Overlapping TCRα Read Sequence and Cancer Specific TCRα Readin Cancer Patient

TABLE 3-11-1 No. of No. of SEQ cancer healthy ID patients individualsTCR types NO: No. TRAV TRAJ CDR3 with overlap with overlap % Read (Mean)1648 1 TRAV1-2 TRAJ33 CAVMDSNYQLIWG 12 8 0.36 MAIT (mucosal-associatedinvariant T)

TABLE 3-11-2 1649 2 TRAV1-2 TRAJ33 CAVRDSNYQLIWG 10 7 0.26 MAIT 1650 3TRAV1-2 TRAJ33 CAVLDSNYQLIWG 6 6 0.38 MAIT 1651 4 TRAV1-2 TRAJ20CAVRDGDYKLSFG 6 8 0.38 MAIT 1652 5 TRAV13-1 TRAJ16 CAASKGGQKLLFA 5 20.28 Cancer specific 1653 6 TRAV1-2 TRAJ33 CAVKDSNYQLIWG 4 5 0.11 MAIT1654 7 TRAV1-2 TRAJ33 CAAMDSNYQLIWG 4 5 0.41 MAIT 1655 8 TRAV9-2 TRAJ57CALTQGGSEKLVFG 3 0 0.40 Cancer specific 1656 9 TRAV9-2 TRAJ36CALSDQTGANNLFFG 3 1 0.11 Cancer specific 1657 10 TRAV9-2 TRAJ34CALRKVRHRQAHLW 3 0 0.39 Cancer specific 1658 11 TRAV9-2 TRAJ21CALRGYNFNKFYFG 3 0 0.13 Cancer specific 1659 12 TRAV9-2 TRAJ20CALNDYKLSFG 3 5 0.42 Cancer specific 1660 13 TRAV4 TRAJ5CLVGDRDTGRRALTFG 3 0 0.20 Cancer specific 1661 14 TRAV38-2 TRAJ45CAYRSYSGGGADGLTFG 3 0 0.42 Cancer specific 1662 15 TRAV21 TRAJ29CAVSGNTPLVFG 3 1 0.36 Cancer specific 1663 16 TRAV21 TRAJ26 CAVYGONFVFG3 0 0.36 Cancer specific 1664 17 TRAV13-2 TRAJ22 CAERVSSGSARQLTFG 3 01.21 Cancer specific 1665 18 TRAV12-3 TRAJ11 CAMNSGYSTLTFG 3 1 0.12Cancer specific 1666 19 TRAV1-2 TRAJ33 CAALDSNYQLIWG 3 5 0.08 MAIT 166720 TRAV1-2 TRAJ12 CAVMDSSYKLIFG 3 7 0.20 MAIT 1668 21 TRAV10 TRAJ18CVVSDRGSTLGRLYFG 3 6 0.21 NKT 1669 22 TRAV9-2 TRAJ6 CALSLSGGSYIPTFG 2 10.13 Cancer specific 1670 23 TRAV9-2 TRAJ54 CALSDRGAQKLVFG 2 1 0.26Cancer specific 1671 24 TRAV9-2 TRAJ54 CALIIGREPRSWYL 2 0 0.13Cancer specific 1672 25 TRAV9-2 TRAJ54 CALIIGEGAQKLVFG 2 0 0.14Cancer specific 1673 26 TRAV9-2 TRAJ53 CALSGSGGSNYKLTFG 2 3 0.56Cancer specific 1674 27 TRAV9-2 TRAJ53 CALSDLSGGSNYKLTFG 2 0 0.04Cancer specific 1675 28 TRAV9-2 TRAJ52 CALRAGGTSYGKLTFG 2 2 0.69Cancer specific 1676 29 TRAV9-2 TRAJ5 CALTLTMGRRALTFG 2 0 0.11Cancer specific 1677 30 TRAV9-2 TRAJ48 CALDFGNEKLTFG 2 0 0.09Cancer specific 1678 31 TRAV9-2 TRAJ45 CAPPPHGLTFG 2 0 0.21Cancer specific 1679 32 TRAV9-2 TRAJ45 CALSYSGGGADGLTFG 2 0 0.28Cancer specific 1680 33 TRAV9-2 TRAJ45 CALRGGGADGLTFG 2 0 0.17Cancer specific 1681 34 TRAV9-2 TRAJ44 CALNTGTASKLTFG 2 1 0.27Cancer specific 1682 35 TRAV9-2 TRAJ43 CALSDRNNDMRFG 2 0 2.78Cancer specific 1683 36 TRAV9-2 TRAJ39 CALRAGNMLTFG 2 0 0.08Cancer specific

TABLE 3-11-3 1684 37 TRAV9-2 TRAJ39 CALLNNAGNMLTFG 2 1 0.17Cancer specific 1685 38 TRAV9-2 TRAJ37 CALSSNTGKLIFG 2 1 0.48Cancer specific 1686 39 TRAV9-2 TRAJ34 CALSDNTDKLIFG 2 2 0.09Cancer specific 1687 40 TRAV9-2 TRAJ34 CALIDTDKLIFG 2 0 0.09Cancer specific 1688 41 TRAV9-2 TRAJ32 CALSGGATNKLIFG 2 1 0.17Cancer specific 1689 42 TRAV9-2 TRAJ31 CALTSNARLMFG 2 0 1.16Cancer specific 1690 43 TRAV9-2 TRAJ31 CALNNNARLMFG 2 1 0.07Cancer specific 1691 44 TRAV9-2 TRAJ3 CALSSYSSASKIIFG 2 0 0.33Cancer specific 1692 45 TRAV9-2 TRAJ3 CALSHSSASKIIFG 2 0 0.05Cancer specific 1693 46 TRAV9-2 TRAJ3 CALSDRRSSASKIIFG 2 0 0.10Cancer specific 1694 47 TRAV9-2 TRAJ3 CALRDSSASKIIFG 2 3 0.10Cancer specific 1695 48 TRAV9-2 TRAJ29 CALVSGNTPLVFG 2 0 0.14Cancer specific 1696 49 TRAV9-2 TRAJ29 CALRGSGNTPLVFG 2 0 0.09Cancer specific 1697 50 TRAV9-2 TRAJ27 CALSDRDTNAGKSTFG 2 1 0.37Cancer specific 1698 51 TRAV9-2 TRAJ27 CALNTNAGKSTFG 2 2 0.12Cancer specific 1699 52 TRAV9-2 TRAJ23 CALSFYNQGGKLIFG 2 1 0.08Cancer specific 1700 53 TRAV9-2 TRAJ23 CALSDYNQGGKLIFG 2 2 0.10Cancer specific 1701 54 TRAV9-2 TRAJ23 CALPIYNQGGKLIFG 2 0 0.07Cancer specific 1702 55 TRAV9-2 TRAJ22 CALAGSARQLTFG 2 0 0.05Cancer specific 1703 56 TRAV9-2 TRAJ21 CAPRYNFNKFYFG 2 0 0.25Cancer specific 1704 57 TRAV9-2 TRAJ20 CALSGDDYKLSFG 2 0 0.04Cancer specific 1705 58 TRAV9-2 TRAJ17 CALSDKAAGNKLTFG 2 0 0.24Cancer specific 1706 59 TRAV9-2 TRAJ17 CALFKAAGNKLTFG 2 0 0.50Cancer specific 1707 60 TRAV9-2 TRAJ16 CALSDRDGQKLLFA 2 1 0.18Cancer specific 1708 61 TRAV9-2 TRAJ15 CALSGQAGTALIFG 2 1 0.31Cancer specific 1709 62 TRAV9-2 TRAJ15 CALSAVEAGTALIFG 2 0 0.82Cancer specific 1710 63 TRAV9-2 TRAJ13 CALTPSGGYQKVTFG 2 0 0.08Cancer specific 1711 64 TRAV9-2 TRAJ10 CALGGAGGGNKLTFG 2 0 0.24Cancer specific 1712 65 TRAV8-3 TRAJ44 CAVVIETTGTASKLTFG 2 0 0.62Cancer specific 1713 66 TRAV8-3 TRAJ43 CAVGALNNNDMRFG 2 3 0.43Cancer specific 1714 67 TRAV8-3 TRAJ17 CAVGAAAGNKLTFG 2 0 0.28Cancer specific 1715 68 TRAV8-2 TRAJ4 CVVSLSGGYNKLILE 2 0 0.43Cancer specific 1716 69 TRAV8-1 TRAJ27 CAVQWWCYNKLIFG 2 0 0.17Cancer specific 1717 70 TRAV6 TRAJ6 CALGSDGGSYIPTFG 2 0 0.67Cancer specific 1718 71 TRAV6 TRAJ37 CALDISGNTGKLIFG 2 0 0.08Cancer specific

TABLE 3-11-4 1719 72 TRAV6 TRAJ34 CALGRFRQAHLW 2 0 0.21 Cancer specific1720 73 TRAV5 TRAJ5 CAESRLTLYGHGQESTYFW 2 0 0.10 Cancer specific 1721 74TRAV5 TRAJ5 CAERDTGRRALTFG 2 0 0.45 Cancer specific 1722 75 TRAV5 TRAJ36CAESKRTGANNLFFG 2 0 0.59 Cancer specific 1723 76 TRAV4 TRAJ9CLVGVEASKLSL 2 0 0.05 Cancer specific 1724 77 TRAV4 TRAJ40CLVGTTSGTYKYIFG 2 0 1.64 Cancer specific 1725 78 TRAV4 TRAJ37CLVGDTSNTGKLIFG 2 0 0.14 Cancer specific 1726 79 TRAV4 TRAJ37CLDTSNTGKLIFG 2 0 0.36 Cancer specific 1727 80 TRAV4 TRAJ22CLLTGSARQLTFG 2 0 0.07 Cancer specific 1728 81 TRAV38-2 TRAJ32CAYRSGYGGATNKLIFG 2 0 0.08 Cancer specific 1729 82 TRAV38-2 TRAJ31CAYRRRNNNARLMFG 2 0 2.47 Cancer specific 1730 83 TRAV38-1 TRAJ33CAFMKHDWDSNYQLIWG 2 0 0.25 Cancer specific 1731 84 TRAV38-1 TRAJ32CAFMTPGGATNKLIFG 2 0 0.17 Cancer specific 1732 85 TRAV36 TRAJ54CAAIQGAQKLVFG 2 0 0.32 Cancer specific 1733 86 TRAV35 TRAJ58CAGRPETSGSRLTFG 2 0 0.11 Cancer specific 1734 87 TRAV35 TRAJ53CAGQGGGSNYKLTFG 2 0 0.15 Cancer specific 1735 88 TRAV35 TRAJ28CAGQESGAGSYQLTFG 2 0 0.07 Cancer specific 1736 89 TRAV35 TRAJ26CAGPDNYGQNFVFG 2 0 0.18 Cancer specific 1737 90 TRAV3 TRAJ42CAVRDMRYGGSQGNLIFG 2 0 0.22 Cancer specific 1738 91 TRAV3 TRAJ4CAVRDSGGYNKLYFW 2 0 0.23 Cancer specific 1739 92 TRAV3 TRAJ4CAVRDSGGYNKLIFG 2 0 1.27 Cancer specific 1740 93 TRAV3 TRAJ29CAVRAVNSGNTPLVFG 2 0 0.09 Cancer specific 1741 94 TRAV29 TRAJ40CAASDSGTYKYIFG 2 1 0.09 Cancer specific 1742 95 TRAV29 TRAJ29CAA1EGNTPLVFG 2 0 0.50 Cancer specific 1743 96 TRAV26-2 TRAJ7CTNPLGGNNRLAFG 2 0 0.31 Cancer specific 1744 97 TRAV26-2 TRAJ44CILRDNTGTASKLTFG 2 0 0.83 Cancer specific 1745 98 TRAV26-2 TRAJ35CILGGVWECAALR 2 0 0.07 Cancer specific 1746 99 TRAV26-2 TRAJ35CILGGFGNVLHCG 2 0 0.91 Cancer specific 1747 100 TRAV26-2 TRAJ32CILRVVLQTSSSL 2 0 0.19 Cancer specific 1748 101 TRAV26-2 TRAJ23CILRDGHNQGGKLIFG 2 0 0.15 Cancer specific 1749 102 TRAV26-2 TRAJ21CILMNNFNKFTLD 2 0 0.14 Cancer specific 1750 103 TRAV26-2 TRAJ18CILTQRLNSGRLYFG 2 0 0.17 Cancer specific 1751 104 TRAV26-2 TRAJ18CILTQRLNSGEAILW 2 0 0.75 Cancer specific 1752 105 TRAV26-1 TRAJ57CIVRVAQGGSEKLVFG 2 1 0.14 Cancer specific 1753 106 TRAV26-1 TRAJ52CIVRVSAGGTSYGKLTFG 2 0 0.14 Cancer specific

TABLE 3-11-5 1754 107 TRAV26-1 TRAJ5 CIVTAYTGRRALTLG 2 0 0.07Cancer specific 1755 108 TRAV26-1 TRAJ5 CIVTAYTGRRALTFG 2 0 0.15Cancer specific 1756 109 TRAV26-1 TRAJ49 CIVRVPNTGNQFYFG 2 0 0.52Cancer specific 1757 110 TRAV26-1 TRAJ44 CIVRADTGTASKLTFG 2 0 0.21Cancer specific 1758 111 TRAV26-1 TRAJ34 CIVRVDNTDKLIFG 2 1 0.88Cancer specific 1759 112 TRAV22 TRAJ13 CAGSLRGYQKVTFG 2 0 0.23Cancer specific 1760 113 TRAV22 TRAJ12 CAGMDSSYKLIFG 2 0 0.23Cancer specific 1761 114 TRAV21 TRAJ9 CAVGNTGGFKTIFG 2 0 0.71Cancer specific 1762 115 TRAV21 TRAJ6 CAVKGGSYIPTFG 2 0 0.08Cancer specific 1763 116 TRAV21 TRAJ48 CAVNHFGNEKLTFG 2 0 0.62Cancer specific 1764 117 TRAV21 TRAJ44 CAVSTGTASKLTFG 2 0 0.59Cancer specific 1765 118 TRAV21 TRAJ44 CAVRGTGTASKLTFG 2 0 0.88Cancer specific 1766 119 TRAV21 TRAJ41 CAVARGSGYALNFG 2 0 0.15Cancer specific 1767 120 TRAV21 TRAJ29 CAVNSGNTPLVFG 2 0 0.25Cancer specific 1768 121 TRAV21 TRAJ28 CAVNYGQNFVFG 2 0 0.44Cancer specific 1769 122 TRAV21 TRAJ22 CAVPFWFCKATDLW 2 0 3.29Cancer specific 1770 123 TRAV21 TRAJ10 CAVGSGGGNKLTFG 2 0 0.64Cancer specific 1771 124 TRAV20 TRAJ57 CAVQGGSEKLVFG 2 1 0.54Cancer specific 1772 125 TRAV20 TRAJ52 CAVQVRGTSYGKLTFG 2 0 0.43Cancer specific 1773 126 TRAV20 TRAJ22 CAVSGSARQLTFG 2 0 1.05Cancer specific 1774 127 TRAV2 TRAJ6 CAVEGTGGSYIPTFG 2 0 0.38Cancer specific 1775 128 TRAV2 TRAJ5 FPHGQESTYFW 2 0 3.18Cancer specific 1776 129 TRAV2 TRAJ5 CAVDMDTGRRALTFG 2 0 0.13Cancer specific 1777 130 TRAV2 TRAJ44 CAVGNTGTASKLTFG 2 0 3.51Cancer specific 1778 131 TRAV2 TRAJ43 CAVEDNNDMRFG 2 0 0.39Cancer specific 1779 132 TRAV2 TRAJ42 CAVDYGGSQGNLIFG 2 1 0.08Cancer specific 1780 133 TRAV2 TRAJ37 CAVEWSSNTGKLIFG 2 0 0.15Cancer specific 1781 134 TRAV2 TRAJ34 CAVPYNTDKLIFG 2 0 0.36Cancer specific 1782 135 TRAV2 TRAJ34 CAVAVDKLIFG 2 0 0.45Cancer specific 1783 136 TRAV2 TRAJ33 CAVKRGDSNYQLIWG 2 0 0.24Cancer specific 1784 137 TRAV2 TRAJ33 CAVEDNYQLIWG 2 0 0.13Cancer specific 1785 138 TRAV2 TRAJ33 CAVD SNYQLIWG 2 0 0.31Cancer specific 1786 139 TRAV2 TRAJ31 CAVELNARLMFG 2 0 3.40Cancer specific 1787 140 TRAV2 TRAJ30 CAVEDRRDDKIIFG 2 0 0.11Cancer specific 1788 141 TRAV2 TRAJ3 CAVEDQNSSASKIIFG 2 0 0.61Cancer specific

TABLE 3-11-6 1789 142 TRAV2 TRAJ3 cavalqqcfqdnlw 2 0 0.97Cancer specific 1790 143 TRAV2 TRAJ27 CAANAGKSTFG 2 0 0.71Cancer specific 1791 144 TRAV2 TRAJ26 CAVYNYGONFVFG 2 0 1.34Cancer specific 1792 145 TRAV2 TRAJ26 CAVEDRNYGQNFVFG 2 1 0.22Cancer specific 1793 146 TRAV2 TRAJ26 CAVDNYGQNFVFG 2 3 0.36Cancer specific 1794 147 TRAV2 TRAJ26 CAADNYGQNFVFG 2 0 1.62Cancer specific 1795 148 TRAV2 TRAJ22 CAVESAARQLTFG 2 0 3.72Cancer specific 1796 149 TRAV2 TRAJ20 CAVSSNDYKLSFG 2 0 0.20Cancer specific 1797 150 TRAV2 TRAJ15 CAVPNQAGTALIFG 2 0 0.12Cancer specific 1798 151 TRAV2 TRAJ15 CAVANQAGTALIFG 2 1 0.52Cancer specific 1799 152 TRAV2 TRAJ13 CAVLNSGGYQKVTFG 2 0 0.25Cancer specific 1800 153 TRAV19 TRAJ41 CALSEFSGYALNFG 2 0 0.51Cancer specific 1801 154 TRAV17 TRAJ30 CATVSNRDDKIIFG 2 0 0.19Cancer specific 1802 155 TRAV16 TRAJ57 CALATQGGSEKLVFG 2 0 0.05Cancer specific 1803 156 TRAV16 TRAJ47 CALSLKYGNKLVFG 2 0 1.34Cancer specific 1804 157 TRAV14 TRAJ22 CAMREPWNSGSARQLTFG 2 0 0.09Cancer specific 1805 158 TRAV13-2 TRAJ6 CAENPTGGSYIPTFG 2 0 0.69Cancer specific 1806 159 TRAV13-2 TRAJ56 CAESPTGANSKLTFG 2 0 0.33Cancer specific 1807 160 TRAV13-2 TRAJ45 CAEPRRGGADGLTFG 2 0 0.36Cancer specific 1808 161 TRAV13-2 TRAJ39 CAENNAGNMLTFG 2 6 0.33Cancer specific 1809 162 TRAV13-2 TRAJ34 CAENIKNTDKLIFG 2 0 0.28Cancer specific 1810 163 TRAV13-2 TRAJ21 CAERGGINKFYFG 2 0 0.24Cancer specific 1811 164 TRAV13-2 TRAJ15 CAENQAGTALIFG 2 3 0.34Cancer specific 1812 165 TRAV13-1 TRAJ9 CAASKGGEKTIFG 2 0 0.18Cancer specific 1813 166 TRAV13-1 TRAJ52 CAAAGGTSYGKLTFG 2 1 0.19Cancer specific 1814 167 TRAV13-1 TRAJ5 CAADTGRRALTFG 2 0 0.18Cancer specific 1815 168 TRAV13-1 TRAJ45 CAASSYSGGGADGLTFG 2 0 0.25Cancer specific 1816 169 TRAV13-1 TRAJ45 CAAPRVGGGADGLTFG 2 0 1.38Cancer specific 1817 170 TRAV13-1 TRAJ33 CAASKRSNYQLIWG 2 0 0.08Cancer specific 1818 171 TRAV13-1 TRAJ33 CAASKGSNYQLIWG 2 1 0.04Cancer specific 1819 172 TRAV13-1 TRAJ32 CAASYGGATNKLIFG 2 0 0.16Cancer specific 1820 173 TRAV13-1 TRAJ3 CAARGSSASKIIFG 2 0 0.13Cancer specific 1821 174 TRAV13-1 TRAJ27 CAATYRNAGKSTFG 2 0 0.26Cancer specific 1822 175 TRAV13-1 TRAJ23 CAASLYNQGGKLIFG 2 1 0.53Cancer specific 1823 176 TRAV13-1 TRAJ21 CAASRGNFNKFYFG 2 0 0.13Cancer specific

TABLE 3-11-7 1824 177 TRAV13-1 TRAJ20 CAAQKGDYKLSFG 2 0 0.08Cancer specific 1825 178 TRAV13-1 TRAJ15 CAASNQAGTALIFG 2 3 0.53Cancer specific 1826 179 TRAV13-1 TRAJ15 CAANQAGTALIFG 2 1 0.23Cancer specific 1827 180 TRAV13-1 TRAJ10 CAATREEETNSPL 2 0 0.09Cancer specific 1828 181 TRAV12-3 TRAJ37 CAMSASSNTGKLIFG 2 1 1.82Cancer specific 1829 182 TRAV12-3 TRAJ31 CAMNNNARLMFG 2 0 0.18Cancer specific 1830 183 TRAV12-3 TRAJ27 CAMRGIRDAGKSTFG 2 0 0.17Cancer specific 1831 184 TRAV12-3 TRAJ23 CAMSAYNQGGKLIFG 2 0 0.38Cancer specific 1832 185 TRAV12-3 TRAJ21 CAMSEGRHNFNKFTLD 2 0 0.19Cancer specific 1833 186 TRAV12-3 TRAJ11 CAMTGYSTLTFG 2 0 0.08Cancer specific 1834 187 TRAV12-2 TRAJ6 CAVYRRKLHTYIW 2 0 0.19Cancer specific 1835 188 TRAV12-2 TRAJ3 CAVYSSASKIIFG 2 2 0.08Cancer specific 1836 189 TRAV12-1 TRAJ9 CGLNTGGFKTIFG 2 0 0.80Cancer specific 1837 190 TRAV12-1 TRAJ6 CVVNEGGSYIPTFG 2 1 0.99Cancer specific 1838 191 TRAV12-1 TRAJ5 CVVPLLMDTGRRALTFG 2 0 0.13Cancer specific 1839 192 TRAV12-1 TRAJ5 CVVNMDTGRRALTFG 2 0 0.14Cancer specific 1840 193 TRAV12-1 TRAJ42 CVLKPRGSQGNLIFG 2 0 0.22Cancer specific 1841 194 TRAV12-1 TRAJ36 CVVNSPGANNLFFG 2 0 0.10Cancer specific 1842 195 TRAV12-1 TRAJ26 CVVNDYGQNFVFG 2 2 0.14Cancer specific 1843 196 TRAV12-1 TRAJ20 CAVNDYKLSFG 2 1 0.36Cancer specific 1844 197 TRAV1-2 TRAJ33 CAVRDSSNYQLIWG 2 0 0.07 MAIT1845 198 TRAV1-2 TRAJ33 CAVRDGNYQLIWG 2 2 0.42 MAIT 1846 199 TRAV1-2TRAJ33 CAVMDSNYQLIWA 2 3 0.06 MAIT 1847 200 TRAV1-2 TRAJ33 CAVLDSNYQLIWA2 1 0.04 MAIT 1848 201 TRAV1-2 TRAJ33 CATMDSNYQLIWG 2 2 0.07 MAIT 1849202 TRAV1-2 TRAJ32 CAVRDHGGATNKLIFG 2 0 0.49 Cancer specific 1850 203TRAV1-2 TRAJ15 CAVRGQAGTALIFG 2 0 0.34 Cancer specific 1851 204 TRAV1-2TRAJ12 CASLDSSYKLIFG 2 0 0.34 MAIT 1852 205 TRAV1-1 TRAJ33 CAVRDSNYQLIWG2 0 0.26 Cancer specific 1853 206 TRAV1-1 TRAJ29 CAVRDSRRGNTPLVFG 2 00.13 Cancer specific 1854 207 TRAV1-1 TRAJ27 CAVREPNTNAGKSTFG 2 0 0.30Cancer specific 1855 208 TRAV1-1 TRAJ17 CAVKAAGNKLTFG 2 0 0.11Cancer specific 1856 209 TRAV10 TRAJ8 CVVTLTMNTGFQKLVFG 2 0 2.11Cancer specific 1857 210 TRAV10 TRAJ40 CVVPTSGTYKYIFG 2 0 1.08Cancer specific 1858 211 TRAV10 TRAJ4 CVVTPSRAGGYNKLILE 2 0 0.15Cancer specific

TABLE 3-11-8 1859 212 TRAV10 TRAJ4 CVVSAESG 2 0 0.90 Cancer GYNKLILEspecific 1860 213 TRAV10 TRAJ4 CVVSAESG 2 0 1.70 Cancer GYNKLIFGspecific

11. Extraction of Cancer Specific TCR Sequence

TCRα reads that overlap at a high frequency comprise many invariantTCRs. These sequences are also present in healthy individuals, who arenormal controls, and are not TCRs that react to a tumor antigen. Inorder to extract a cancer specific TCR, overlapping reads in cancertissue which are not detected in a sample of a healthy individual wereclassified as a cancer specific TCR (Table 3-12). There were 56overlapping leads that are also present in healthy individuals in a TCRαchain, while there was only one such read in a TCR chain. Reads with anumber of overlapping individuals of 4 or more were reads that were alsopresent in a healthy individual or an invariant TCR. A cancer specificread overlapping in 3 or less individuals was detected in 157 reads(1.22%) in a TCRα chain and 48 reads (0.11%) in a TCR chain (also seeFIG. 49 for method of estimating TCRαβ pair reads).

Table 3-12 Overlapping TCR Read Sequences and Cancer Specific TCR inCancer Patient

TABLE 3-12-1 No. of No. healthy of indi- cancer vid- SEQ patients uals %ID with with Read TCR NO No. TRBV TRBJ CDR3 overlap overlap (mean) type1861 1 TRBV20-1 TRBJ2-1 CSARFPGGGREQFFG 3 0 0.43 Cancer Specific 1862 2TRBV13 TRBJ2-7 CASSLAGGPYEQYFG 3 0 1.98 Cancer Specific 1863 3 TRBV9TRBJ2-1 CASSSTDTQYFG 2 0 0.16 Cancer Specific 1864 4 TRBV7-9 TRBJ2-3CASSVDGDSYNEQFFG 2 0 0.20 Cancer Specific 1865 5 TRBV7-9 TRBJ1-6CASSSTDTQYFQ 2 0 0.04 Cancer Specific 1866 6 TRBV7-9 TRBJ1-3CASSLSGDNSPLHFG 2 1 0.06 Cancer Specific 1867 7 TRBV7-8 TRBJ2-2CASSSPRGELFFG 2 0 0.50 Cancer Specific 1868 8 TRBV7-8 TRBJ1-3CASSRMGQGVGGNTIYFG 2 0 0.10 Cancer Specific 1869 9 TRBV7-6 TRBJ2-1CASSQRTSGITNEQFFG 2 0 0.47 Cancer Specific 1870 10 TRBV7-3 TRBJ2-2CASSLIGAGELFFW 2 0 0.47 Cancer Specific 1871 11 TRBV6-6 TRBJ2-3CASSTSSDTQYFW 2 0 0.16 Cancer Specific 1872 12 TRBV6-6 TRBJ1-1CASSYGMGVNTEAFFG 2 0 0.33 Cancer Specific 1873 13 TRBV6-5 TRBJ2-7CASSIQGYEQYFG 2 0 0.19 Cancer Specific 1874 14 TRBV6-5 TRBJ2-3CASGWASTDTQYFG 2 0 0.21 Cancer Specific 1875 15 TRBV6-3 TRBJ2-7CASSYGASSYEQYFG 2 0 0.25 Cancer Specific

TABLE 3-12-2 1876 16 TRBV6-3 TRBJ-5 CASSYTAKKETQYFG 2 0 0.40Cancer specific 1877 17 TRBV5-1 TRBJ2-7 CASSASLAGYEQYFG 2 0 0.11Cancer specific 1878 18 TRBV4-3 TRBJ2-1 CASSHNIGTGNEOFFG 2 0 1.75Cancer specific 1879 19 TRBV4-3 TRBJ1-2 CASSQDRSRVYGYTFG 2 0 0.13Cancer specific 1880 20 TRBV4-2 TRBJ1-2 CASSQDVYGYTFG 2 0 0.55Cancer specific 1881 21 TRBV4-1 TRBJ2-7 CASSQDLGVLRAVLR 2 0 0.17Cancer specific 1882 22 TRBV4-1 TRBJ2-1 CASSLAQDYNECIFFG 2 0 0.58Cancer specific 1883 23 TRBV4-1 TRBJ1-5 CASSQAPGOGAHFG 2 0 0.14Cancer specific 1884 24 TRBV29-1 TRBJ2-7 CSVAAGVNYEQYFG 2 0 2.39Cancer specific 1885 25 TRBV29-1 TRBJ2-1 CSVRPGTSGRGNEOFFG 2 0 0.16Cancer specific 1886 26 TRBV29-1 TRBJ2-1 CSVLREITYNEQFFG 2 0 0.18Cancer specific 1887 27 TRBV29-1 TRBJ2-1 CSVEPGAREQFFG 2 0 0.44Cancer specific 1888 28 TRBV29-1 TRBJ2-1 CSVDLYNEOFFG 2 0 4.36Cancer specific 1889 29 TRBV29-1 TRBJ2-1 CSAMLIGGGNEOFFG 2 0 1.83Cancer specific 1890 30 TRBV25-1 TRBJ2-5 CASGQETQYFG 2 0 0.34Cancer specific 1891 31 TRBV24/ TRBJ2-1 CATSDLSGGSRSSY 2 0 0.25Cancer specific OR9-2 NEQFFG 1892 32 TRBV20-1 TRBJ2-7 CSAPGGNLRAVLR 2 00,47 Cancer specific 1893 33 TRBV20-1 TRBJ2-6 CSAWDFTNSGANVLTFG 2 0 0.14Cancer specific 1894 34 TRBV20-1 TRBJ2-5 CSARGRWAETQYFG 2 0 0.18Cancer specific 1895 35 TRBV20-1 TRBJ2-5 CSAKQASETQYFG 2 0 0.09Cancer specific 1896 36 TRBV20/ TRBJ2-4 CSARDWRGAKNIQYFG 2 0 0.08Cancer specific OR9-2 1897 37 TRBV19 TRBJ2-1 CASSMIREYNEQFFG 2 0 0,39Cancer specific 1898 38 TRBV19 TRBJ2-1 CASSITSASYEQFFG 2 0 0.08Cancer specific 1899 39 TRBV19 TRBJ1-5 CASSILGNGNQPQHFG 2 0 0.37Cancer specific 1900 40 TRBV19 TRBJ1-2 CASSIERGIYGYTFG 2 0 0.32Cancer specific 1901 41 TRBV18 TRBJ2-7 CASSPLNEYEQYFG 2 0 0.05Cancer specific 1902 42 TRBV12-4 TRBJ2-7 CASSMGTGIYEGYFG 2 0 0.68Cancer specific 1903 43 TRBV12-4 TRBJ1-1 CASSFSAPKPTPLSL 2 0 0.08Cancer specific 1904 44 TRBV12-3 TRBJ1-1 CASSLRANTEAFFG 2 0 0.80Cancer specific 1905 45 TRBV11-2 TRBJ2-3 CASSSAGDTQYFW 2 0 0.30Cancer specific 1906 46 TRBV11-1 TRBJ2-7 CASSRRQGAYEQYFG 2 0 0.33Cancer specific 1907 47 TRBV11-1 TRBJ1-2 CASSLPGYGYTFG 2 0 0.21Cancer specific 1908 48 TRBV10-3 TRBJ2-3 CAISERRIAGTSTD 2 0 0.69Cancer specific TQYFG 1909 49 TRBV10-3 TRBJ1-2 CAISEFAGPEGYTFG 2 0 0.19Cancer specific

Table 3-13 Estimation of Paired TCRαβ by Combination of OverlappingIndividuals

TABLE 3-13 TCR β, read number TCR α, dread number 1 2 3 205 4 49 5 176 6121 7 8 9 64 10 144 11 12 54 13 54 14 24, 25, 111, 162, 210 15 76 16112, 139 17 18 43 19 30 20 89, 133 21 72, 82, 84, 95, 100, 129, 168,179, 183, 186, 191 22 23 96, 159 24 112, 139 25 26 27 96, 159 28 43 2930 31 42, 62 32 33 75, 119, 157 34 54, 208 35 36 77 37 38 46, 102, 125,155, 204 39 40 41 42 189 43 44 158 45 144 46 47 48 112, 139 49 96, 159

(Example 3 of analysis system: Sequencing using Ion PGM system (IonTorrent)) (1. RNA extraction) 5 mL of whole blood was collected from ahealthy individual in a heparin-containing blood collection tube.Peripheral blood mononuclear cells (PBMC) were separated by ficolldensity gradient centrifugation. Total RNA was extracted/purified fromthe isolated PBMCs by using an RNeasy Lipid Tissue Mini Kit (Qiagen,Germany). The resulting RNA was quantified by using an Agilent 2100bioanalyzer (Agilent).

(2. Synthesis of Complementary DNA and Double Stranded ComplementaryDNA)

The extracted RNA sample was used to carry out an adaptor-ligation PCR.The method was carried out in accordance with the method shown inExample 1. Specifically, a BSL-18E primer (Table 3-14) and RNA wereadmixed and annealed, and then a reverse transcriptase was used tosynthesize a complementary strand DNA. A double-stranded DNA wassubsequently synthesized. Furthermore, T4 DNA polymerase was used toperform a 5′ terminal blunting reaction. After column purification by aHigh Pure PCR Cleanup Micro Kit (Roche), a P20EA/P10EA adaptor was addedin a ligation reaction. An adaptor added double stranded complementaryDNA purified by a column was digested by a NotI restriction enzyme.

TABLE 3-14 Primer sequences Primer Sequence BSL-18F1AAAGCGGCCGCATGCTTTTTTTTTT TTTTTTTTVN (SEQ ID NO: 32) P20EATAATACGACTCCGAATTCCC (SEQ ID NO:33) P10KA GGGAATTCGG ( SEQ ID NO:34) CA1TGTTGAAGGCGTTTGCACATGCA (SEQ ID NO: 35) CA2 GTGCATAGACCTCATGTCTAGCA(SEQ ID NO: 36) CB1 GAACTGGACTTGACAGCGGAACT (SEQ ID NO: 37) CB2AGGCAGTATCTGGAGTCATTGAG (SEQ ID NO: 38)

(3. PCR)

The 1^(st) PCR amplification was performed for a first PCR amplificationreaction product from a double stranded complementary DNA by using acommon adaptor primer P20EA and a TCRα chain or β chain C regionspecific primer (CA1 or CB1) shown in Table 3-14. 20 cycles of PCR wereperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C. with the following reaction composition.

TABLE 3-15A 1^(st) PCR amplification reaction composition Content (μL)Final concentration 2x ExTaq Premix 10 10 mM Tris-HCl (pH8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EA 0.5250 nM primer 10 mM CA1 or 0.5 250 nM CB1 primer Double stranded 2complementary DNA Sterilized water 7

A 1^(st) PCR amplicon was then used to perform 2^(nd) PCR with thereaction composition shown below by using a P20EA primer and a TCRαchain or β chain C region specific primer (CA2 or CB2). 20 cycles of PCRwere performed, where a cycle was 30 seconds at 95° C., 30 seconds at55° C., and one minute at 72° C.

TABLE 3-15B 2^(nd) PCR amplification reaction composition Content (μL)Final concentration 2x ExTaq Premix 10 10 mM Tris-HCl (pH8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EA 1500 nM primer 10 mM CA2 or 1 500 nM CB2 primer 1^(st) PCR amplicon 2Sterilized water 6

PCR was performed with the 2^(nd) PCR amplicon diluted 10 fold as atemplate by utilizing a B-P20EA primer shown in FIG. 10, which is aP20EA adaptor primer added with an adaptor B sequence, andHuVaF-Ol-HuVaF10 (α chain) and HuVbF-01-HuVbF-10 (β chain), which are aTCRα chain or β chain C region specific primer added with an adaptor Asequence and each MID Tag sequence (MID-1 to 26). The primer sequencesused are shown in Table 6. 10 cycles of PCR were performed, where acycle was 30 seconds at 95° C., 30 seconds at 55° C., and one minute at72° C. To confirm PCR amplification, 10 μL of amplicon was amplifiedwith 2% agarose gel electrophoresis.

TABLE 3-16 Sequencing primers MID Primer Sequence tag HuVaF-01CCATCTCATCCCTGCGTGTC MID 1 TCCGACTCAGACGAGTGCGT ATAGGCAGACAGACTTGTCA CTG(SEQ ID NO: 40) HuVaF-02 CCATCTCATCCCTGCGTGTC MID 2 TCCGACTCAGACGCTCGACAATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 41) HuVaF-03 CCATCTCATCCCTGCGTGTCMID 3 TCCGACTCAGAGACGCACTC ATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 42)HuVaF-04 CCATCTCATCCCTGCGTGTC MID 4 TCCGACTCAGAGCACTGTAGATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 43) HuVaF-05 CCATCTCATCCCTGCGTGTCMID 5 TCCGACTCAGATCAGACACG ATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 44)HuVaF-06 CCATCTCATCCCTGCGTGTC MID 6 TCCGACTCAGATATCGCGAGATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 45) HuVaF-07 CCATCTCATCCCTGCGTGTCMID 7 TCCGACTCAGCGTGTCTCTA ATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 46)HuVaF-08 CCATCTCATCCCTGCGTGTC MID 8 TCCGACTCAGCTCGCGTGTCATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 47) HuVaF-09 CCATCTCATCCCTGCGTGTCMID 9 TCCGACTCAGTCTGTATGCG ATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 48)HuVaF-10 CCATCTCATCCCTGC MID 11 GTGTCTCCGACTCAG TGATACGTCTATAGGCAGACAGACTTGTCA CTG (SEQ ID NO: 49) HuVbF-01 CCATCTCATCCCTGC MID 15GTGTCTCCGACTCAG ATACGACGTAACACC AGTGTGGCCTTTTGG GTG (SEQ ID NO: 50)HuVbF-02 CCATCTCATCCCTGC MOT 16 GTGTCTCCGACTCAG TCACGTACTAACACCAGTGTGGCCTTTTGG GTG (SEQ ID NO: 51) huVbF-03 CCATCTCATCCCTGC MID 17GTGTCTCCGACTCAG CGTCTAGTACACACC AGTGTGGCCTTTTGG GTG (SEQ ID NO: 52)huVbF-04 CCATCTCATCCCTGC MID 18 GTGTCTCCGACTCAG TCTACGTAGCACACCAGTGTGGCCTTTTGG GTG (SEQ ID NO: 53) huVbF-05 CCATCTCATCCCTGC MID 19GTGTCTCCGACTCAG TGTACTACTCACACC AGTGTGGCCTTTTGG GTG (SEQ ID NO: 54)huVbF-06 CCATCTCATCCCTGC MID 20 GTGTCTCCGACTCAG ACGACTACAGACACCEGTGTGGCCTTTTGG GTG (SEQ ID NO: 55) huVbF-07 CCATCTCATCCCTGC MID 21GTGTCTCCGACTCAG ACGACTACAGACACC AGTGTGGCCTTTTGG GTG (SEQ ID NO: 56)huVbF-08 CCATCTCATCCCTGC MID 22 GTGTCTCCGACTCAG TACGAGTATGACACCAGTGTGGCCTTTTGG GTG (SEQ ID NO: 57) huVbF-09 CCATCTCATCCCTGC MID 23GTGTCTCCGACTCAG TACTCTCGTGACACC AGTGTGGCCTTTTGG GTG (SEQ ID NO: 58)huVbF-10 CCATCTCATCCCTGC MID 24 GTGTCTCCGACTCAG TAGAGACGAGACACCAGTGTGGCCTTTTGG GTG (SEQ ID NO: 59) B p20ea CCTATCCCCTGTGTG —CCTTGGCAGTCTAAT ACGACTCCGAATTCC C (SEQ ID NO: 60)

TABLE 3-15C 3^(rd) PCR amplification reaction composition Content (μL)Final concentration 2x ExTaq Premix 10 10 mM Tris-HCl (pH8.3) (Takara)50 mM KCl 2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM B-P20EA 1500 nM primer 10 mM HuVaF or 1 500 nM

 HuVbF) primer 2^(nd) PCR amplicon 1 Sterilized water 7

An Ion OneTouch 2 system (Ion Torrent) was then used to perform emulsionPCR and adjust a template. An Ion OneTouch 2 (Ion Torrent) kit was usedto mix the following solution.

TABLE 3-17-1 Solution 1 Sterilized water  25 μL Ion PGM Template OT2 200Reagent Mix 500 μL

TABLE 3-17-2 Ion PGM Template OT2 200 Reagent B 300 μL Ion PGM TemplateOT2 200 Enzyme Mix  50 μL Diluted library  25 μL Total amount 900 μL

An Ion Sphere Particle (ISP) bead was stirred and then 100 μL of ISP wasadded and mixed as described below.

TABLE 3-18 Solution 1  900 μL Ion PGM Template OT2 200 Ion SphereParticle  100 μL Total amount 1000 μL

The above-described 1,000 μL is sufficiently mixed and then stirred for5 minutes. After setting up an Ion OneTouch Plus Reaction FilterAssembly, the total amount described above is loaded. Furthermore, 500μL of Ion OneTouch Reaction Oil is added and then a run is initiated.After about 5.5 hours of reaction, a sample is collected. Aftercentrifugation to remove an excessive solution, ISP is collected.

Enrichment

An Ion OneTouch ES (Ion Torrent) is used to enrich a sample. A new tubeis set in a chip loader, and a chip arm is installed. The followingmelt-off solution is then prepared.

Melt-Off Solution

TABLE 3-19 Tween Solution 280 μL 1 M sodium hydroxide  40 μL Totalamount 320 μL

Dispensing the Following Solution to Each Well of 8 Strip Tubes

TABLE 3-20-1 Well 1 ISP sample 100 μL 2 Dynabeads MyOne Beads 130 μL

TABLE 3-20-2 3 Ion OneTouch Wash Solution 300 μL 4 Ion OneTouch WashSolution 300 μL 5 Ion OneTouch Wash Solution 300 μL 6 Empty — 7 Melt-Offsolution 300 μL 8 Empty —

After setting up a reagent, an apparatus for Ion OneTouch ES isinitiated for enrichment. After completion, tubes containing ISP arecollected and gently inverted and mixed 5 times. An Ion PGM Sequencing200 Kit v2 (Ion Torrent) is then used for sequencing.

It can be understood that the system of the present invention can use anapparatus other than Roche apparatuses in this manner.

(Example 4 of analysis system: TCR sequencing using Illumina MiSeqsystem)

The present Example demonstrates whether the system of the presentinvention can be implemented in TCR sequencing using an Illumina MiSeqsystem.

(1. RNA Extraction)

5 mL of whole blood was collected from a healthy individual in aheparin-containing blood collection tube. Peripheral blood mononuclearcells (PBMC) were separated by ficoll density gradient centrifugation.Total RNA was extracted/purified from the isolated PBMCs by using anRNeasy Lipid Tissue Mini Kit (Qiagen, Germany). The resulting RNA wasquantified by using an Agilent 2100 bioanalyzer (Agilent).

(2. Synthesis of Complementary DNA and Double Stranded ComplementaryDNA)

The extracted RNA sample was used to carry out an adaptor-ligation PCR.The method was carried out in accordance with the method shown inExample 1. Specifically, a BSL-18E primer (Table 3-21) and RNA wereadmixed and annealed, and then a reverse transcriptase was used tosynthesize a complementary strand DNA. A double-stranded complementaryDNA was subsequently synthesized, and T4 DNA polymerase was used toperform a 5′ terminal blunting reaction. After column purification by aHigh Pure PCR Cleanup Micro Kit (Roche), a P20EA/P10EA adaptor was addedin a ligation reaction. An adaptor added double stranded complementaryDNA purified by a column was digested by a NotI restriction enzyme.

TABLE 3-21 Primer sequences Primer Sequence BSL-18E AAAGCGGCCGCATGCTTTTTTTTTTTTTTTTTTVN (SEQ ID NO: 32) P20EA TAATACGACTCCGAATTCCC(SEQ ID NO: 33) P10EA GGGAATTCGG (sEQ ID NO: 34) CA1TGTTGAAGGCGTTTGCACATGCA (SEQ ID NO: 35) CA2 GTGCATAGACCTCATGTCTAGCA(SEQ ID NO: 36) CB1 GAACTGGACTTGACAGCGGAACT (SEQ ID NO: 37) CB2AGGCAGTATCTGGAGTCATTGAG (SEQ ID NO: 38)

(3. PCR)

The 1^(st) PCR amplification was performed for a first PCR amplificationreaction product from a double stranded complementary DNA by using acommon adaptor primer P20EA shown in Table 1 and a TCRα chain or β chainC region specific primer (CA1 or CB1). 20 cycles of PCR were performed,where a cycle was 30 seconds at 95° C., 30 seconds at 55° C., and oneminute at 72° C. with the reaction composition in the following Table3-22.

TABLE 3-22 1st PCR amplification reaction composition Content (μL) Finalconcentration 2x ExTaq Premix (Takara) 10  10 mM Tris-HCl (pH 8.3)  50mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mM P20EAprimer 0.5 250 nM 10 mM CA1 or CB1 primer 0.5 250 nM Double stranded 2complementary DNA Sterilized water 7

A 1^(st) PCR amplicon was then used to perform 2^(nd) PCR with thereaction composition shown in the following Table 3-23 by using a P20EAprimer and a TCRα chain or β chain C region specific primer (CA2 orCB2). 20 cycles of PCR were performed, where a cycle was 30 seconds at95° C., 30 seconds at 55° C., and one minute at 72° C.

TABLE 3-23 2^(nd) PCR amplification reaction composition Content (μL)Final concentration 2x ExTaq Premix 10   10 mM Tris-HCl (pH 8.3)(Takara)  50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10mM P20EA primer 1 500 nM 10 mM CA2 or CB2 primer 1 500 nM 1^(st) PCRamplicon 9 Sterilized water 6

(4. MiSeq Dual-Indexed Paired-End Sequencing)

A PCR amplification reaction is performed with the 2^(nd) PCR amplicondiluted 10 fold as a template by utilizing a P5-P20EA primer, which is aP20EA adaptor primer added with a P5 sequence, R1 Seq Primer sequence,and Index2 sequence, and P7-CA3 or P7-CB3, which is a TCRα chain or βchain C region specific primer added with a P7 sequence, R2 Seq Primersequence and Index1 sequence as shown in FIG. 50. Different Index1 andIndex2 sequences are used to label an amplification primer to identifyan amplified TCR gene amplicon from a plurality of samples. The primersequences used are shown in Table 3-24. 10 cycles of PCR were performed,where a cycle was 30 seconds at 95° C., 30 seconds at 55° C., and oneminute at 72° C.

TABLE 3-24 Sequencing primers Primer Sequence P5-P20EA AATGATACGGCGACCACCGAGATCTA CAC-(Index 2)- TCTTTCCCTACAC GACGCTCTTCCGA TCT-TAATACGACTCCGAATTCCC ((1)~(12) Correspond to SEQ ID NOs: 1910~1920, respectively)P7-CA3 CAAGCAGAAGACGGC ATACGAGAT- (Index 1)-GTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCT- ATAGGCAGACAGACT TGTCACTG ((1)~(8) correspondto SEQ ID NOs: 1 922~929, respectively) P7-CB3 CAAGCAGAAGACGGCATACGAGAT-(Index 1)-GTGACTGGAGTT CAGACGTGTGCTCTT CCGATCT-ACACCAGTGTGGCCTTTTGGG TG ((1)~(8) correspond to SEQ ID NOs: 1930~1937,respectively) Index 1 (1) TGAACCTT (2) TGCTAAGT (3) TGTTCTCT (4)TAAGACAC (5) CTAATCGA (6) CTAGAACA (7) TAAGTTCC (8) TAGACCTA Index 2 (1)ATCACGAC (2) ACAGTGGT (3) CAGATCCA (4) ACAAACGG (5) ACCCAGCA (6)AACCCCTC (7) CCCAACCT (8) CACCACAC (9) GAAACCCA (10) TGTGACCA (11)AGGGTCAA (12) AGGAGTGG

TABLE 3-25-1 3^(rd) PCR amplification reaction composition Content (μL)Final concentration 2x ExTaq Premix (Takara) 10  10 mM Tris-HCl (pH 8.3) 50 mM KCl   2 mM MgCl₂ 0.2 mM dNTPs 0.5 U ExTaq polymerase 10 mMP5-P20EA primer  1 500 nM

TABLE 3-25-2 10 mM P7-CA3 or P7-CB3) 1 500 nM primer 2^(nd) PCR amplicon1 Sterilized water 7

(5. Purification of PCR Product by Electrophoresis)

An E-Gel agarose gel electrophoresis system is used for electrophoresisof an amplified PCR product. A precast gel containing a highly sensitivefluorescent staining agent is set in an electrophoresis apparatus, and20 μL of sample per well is added to 2% agarose gel for electrophoresis.The amplicon is collected when a band of interest corresponding to500-600 bp is eluted. The amount of DNA is measured by using a Quant-T™PicoGreen® dsDNA Assay Kit (Invitrogen) for the collected PCR amplicon.Based on the resulting amount of DNA, a plurality of samples of equalmolecular weight are mixed to perform a sequencing reaction.

(6. MiSeq Sequencing)

A MiSeq sample sheet is created. PhiX control is added in the range of5-50% and sequencing is started with an MiSEQ sequencer in which a MiSeqReagent Kit v.3 (600 cycle, Illumina) is installed. After about 65hours, sequencing data can be obtained.

(Example 5 of analysis system: High throughput sequence analysis methodfor determining diversity and similarity of TCRα and TCRβ, repertoiresto determine a potentially novel invariant TCRα chain—invariant TCRexpressed by NKT cells and MAIT cells as example)

A comprehensive example is described below as an example summarizing thetechnologies described in Examples 1-4 of analysis system.

(Introduction)

As discussed above, high throughput sequencing techniques known as nextgeneration sequencing (NGS) have undergone rapid advances, enablinglarge-scale sequence data analysis (Shendure J et al. (2008) NatBiotechnol 26: 1135-1145; Metzker M L et al. (2010) Nat Rev Genet 11:31-46). Several TCR repertoire analysis systems based on NGS have beendeveloped by other researchers. However, many of the amplificationtechniques are based on Multiple PCR comprising a different specificprimer in each variable region. For this reason, bias during PCRamplification cannot be avoided as bias is very common due todifferential hybridization dynamics among variable region specificprimers for different target genes. Thus, when using a Multiple PCRassay, a correction and an additional computational standardizing methodare considered necessary in order to minimize PCR bias (Carlson C S etal. (2013) Nat Commun 4: 2680). Use of a single set of primers is apreferred method for accomplishing unbiased quantitative amplificationof all TCR genes including unknown mutants with a highly diverse 5′terminal of a sequence. Single strand oligonucleotide anchor ligation tothe 3′ terminal of a cDNA including T4 RNA ligase (Troutt A B et al.(1992) Proc Natl Acad Sci USA 89: 9823-9825), cDNA homopolymer tailing,5′ rapid amplification of cDNA ends (RACE) (Frohman M A et al. (1988)Proc Natl Acad Sci USA 85: 8998-9002), and template switching PCR(TS-PCR or SMART PCR) (Zhu Y Y et al. (2001) Biotechniques 30: 892-897)have been used to analyze TCR repertoires (Freeman J D et al. (2009)Genome Res 19: 1817-1824; Warren R L et al. (2011) Genome Res 21:790-797). TS-PCR is simple and convenient, but a TS primer is eithernon-specifically annealed to a random region of RNA or repeated added.Thus, a high level of background amplification occurs (Alon S et al.(2011) Genome Res 21: 1506-1511; Kapteyn J (2010) BMC Genomics 11: 413).In this regard, present specification describes an adaptor-ligationmediated PCR (first reported by Tsuruta et al (Tsuruta Y et al. (1993) JImmunol Methods 161: 7-21; Tsuruta Y et al. (1994) J Immunol Methods169: 17-23)) developed by an addition of an adaptor to the 5′ terminalof a double stranded complementary DNA derived from a subsequent PCRamplicon by a constant region specific primer, adaptor primer, and TCRtranscript. Adaptor-ligation to a blunted double stranded complementaryDNA is barely affected by the specific sequence of a cDNA, while theefficiency of 5′ adaptor-ligation using a T4 RNA ligase is sequencedependent (Jayaprakash A D et al. (2011) Nucleic Acids Res 39: e141).Furthermore, ligation of double stranded DNAs using a T4 ligase is moreefficient than ssDNA ligation using a T4 RNA ligase in ligation anchoredPCR (LA-PCR). Thus, such an unbiased AL-PCR allows accurate analysis ofa TCR repertoire without requiring correction or standardization.

Various sequencing techniques have been developed such as Roche 454 (SanFrancisco, Calif.), Illumina (San Diego, Calif.), Ion-Torrent (LifeTechnologies, Grand Island, N.Y.), SOLiD (Life Technologies), Helicos(Cambridge, Mass.) and PacBio (Menlo Park, Calif.). Among these NGSplatforms, 454 DNA sequencing creates sequence reads in the range of50-600 base pairs (bp) or more and sufficient read output, but thenumber of reads per one sequencing is less than Illumina. Long readsequencing enables determination of the full length or mostlyfull-length of a TCR gene comprising V, D, J, and C regions.Furthermore, a recombinant TCR protein is readily generated bysubsequent PCR cloning of a TCR gene. Thus, the inventors applied theadaptor-ligation mediated PCR to NGS using 454 DNA sequencing.

Natural killer T (NKT) cell are a population of separate T cells havingan important role in natural immunity and acquired immunity. NKT cellsregulate a wide range of immune responses such as autoimmune diseases,tumor surveillance, and host defense against a pathogenic infection. NKTcells express invariant TCRα consisting of Vα24 and Jα18 recognizingglycolipids presented by CD1d and nonclassical major histocompatibilitycomplex class I associated protein (Godfrey D I et al. (2004) J ClinInvest 114: 1379-1388). Recently, mucosal-associated invariant T (MAIT)cells preferentially present in mucous membrane tissue have been shownto be the only T cell population expressing semi-invariant TCRαconsisting of Vα7.2 and Jα33. MAIT cells recognize a microorganismvitamin B metabolite presented by MHC associated protein 1 (MR1) andnonclassical MHC class I molecules (Kjer-Nielsen L et al. (2012) Nature491: 717-723). These T cell populations having invariant TCRα serve acentral role in immunomodulation. However, it is still unidentifiedwhether all invariant TCRα is expressed by the only T cell populations.

In the present study, the inventors used TCR repertoire analysis basedon NGS that has been newly developed to perform NGS sequencing of a TCRtranscript from 20 healthy individuals. First, use of a variable regionand joining region was tested based on the number of sequence reads, andthen clonality and diversity in TCRα and TCRβ, genes were analyzed.Unique read sequences identified by using an independently developedgene analysis program were compared at the clone level among healthyindividuals. The results showed diversity in T cells to a similar extentand similar use of TRV and TRJ among individuals. Interestingly, a TCRβ,read was not shared among individuals, while a TCRα read contained apublic sequence overlapping in 2 or more individuals at a highfrequency. A public TCRα read contained a high percentage of invariantTCRα, indicating the presence of iNKT or MAIT cells.

In the present Example, the inventors show from NGS data that analysisof a TCR gene shared among a plurality of individuals can providesignificant information in invariant TCRs expressed by NKT cells andMAIT cells.

(Demonstration in the Present Example)

High throughput sequencing of T cell receptor (TCR) genes can be apotent tool for analyzing clonality and diversity of T lymphocytes andantigen specificity. In this regard, the inventors have developed anovel TCR repertoire analysis method using 454 DNA sequencing techniquecombined with adaptor-ligation mediated polymerase chain reaction (PCR).This method enables amplification of all TCR genes in a truly unbiasedmanner, contrary to a level of pseudo-bias that can be accomplished withSMART PCR, without any bias that generally occurs in PCR.

In the present Example, the inventors have performed next generationsequencing (NGS) on TCRα and TCRβ genes in peripheral blood mononuclearcells from 20 health individuals to compare diversity and similarity ofexpressed TCR repertoires and use of genes among individuals. 149,216unique reads were identified from a total of 267,037 sequence reads fromthe 20 healthy individuals. Preferential use of some V genes and J geneswas observed, while some recombination in TRAV and TRAJ appeared to belimited. The level of observed TCR diversity differed significantlybetween TCRα and TCRβ, while a TCRα repertoire was more similar amongindividuals than a TCR repertoire. The similarity among individuals ofTCRα was greatly dependent on the presence of a public TCR shared among2 or more individuals at a high frequency. Publicly available TCRα has aTCR near a germ line having a shorter CDR3. A public TCRα sequence,especially a sequence shared among many individuals, often containedinvariant TCRα derived from a mucosal-associated invariant T cell and aninvariant natural killer T cell. The results suggest that search for apublic TCR by NGS is useful in identifying a potentially novel invariantTCRα chain. This NGS method was found to be capable of highly precisecomprehensive analysis of a TCR repertoire at a clone level.

(Materials and Methods)

Isolation of RNA extract and peripheral blood mononuclear cells

Whole blood was collected form 20 healthy individuals after obtaininginformed consent. The present study was approved by the ethics committeeof Clinical Research Center for Allergy and Rheumatology, NationalHospital Organization, Sagamihara National Hospital. 10 mL of wholeblood was collected in a heparin treated tube. Peripheral bloodmononuclear cells (PBMC) were separated by Ficoll-Paque PLUS™ (GEHealthcare Health Sciences, Uppsala, Sweden) density gradientcentrifugation, and washed with phosphate buffered saline (PBS). Thenumber of cells was counted and 1×10⁶ cells were used in RNA extraction.Total RNA was isolated and purified by using an RNeasy Lipid Tissue MiniKit (Qiagen, Hilden, Germany) in accordance with the manufacture'smanual. The amount of RNA and purity were measured by using an Agilent2100 bioanalyzer (Agilent Technologies, Palo Alto, Calif.).

Unbiased Amplification of TCR Gene

1 μg of total RNA was converted to a complementary DNA (cDNA) by using aSuperscript III reverse transcriptase (Invitrogen, Carlsbad, Calif.). ABSL-18E primer comprising poly₁₈ and NotI site was used for cDNAsynthesis. After cDNA synthesis, a double strand (ds)-cDNA wassynthesized by using E. coli DNA polymerase I (Invitrogen), E. coli DNAligase (Invitrogen), and RNase H (Invitrogen). A ds-cDNA was blunted byusing T4 DNA polymerase (Invitrogen). A P10EA/P20EA adaptor was linkedto the 5′ terminal of the ds-cDNA and then cleaved with a NotIrestriction enzyme. After removing the adaptor and primer by using aMinElute Reaction Cleanup kit (Qiagen), PCR was performed by usingeither a TCRα chain constant region specific primer (CA1) or TCR chainconstant region specific primer (CB1) and P20EA (Table 4-1). The PCRconditions were as follows: 20 cycles of 95° C. (30 seconds), 55° C. (30seconds), and 72° C. (one minute). A 2^(nd) PCR was performed by usingthe same PCR conditions with either CA2 or CB2 and a P20EA primer.

TABLE 4-1 Primer Sequence MID Tag BST-18E AAAGCGGCCGCATGCTTTTTTTTTTTTTTTTTTVN (SEQ ID NO: 32) P20EA TAATACGACTCCGAATTCCC (SEQ ID NO: 33) P10EA GGGAATTCGG (SEQ ID NO: 34) CA1TGTTGAAGGCGTTTGCAC ATGCA (SEQ ID NO: 35) CA2 GTGCATAGACCTCATGTC TAGCA(SEQ ID NO: 36) CB1 GAACTGGACTTGACAGCG GAACT (SEQ ID NO: 37) CB2AGGCAGTATCTGGAGTCA TTGAG (SEQ ID NO: 38) HuVaF- CCATCTCATCCCTGCGTGMID1~MID11 01-10 TCTCCGAC TCAG-[MID]- ATAGGCAGACAGACTTGT CACTG(SEQ ID NO: 40-49) huVbF- CCATCTCATCCCTGCGTG MID15~MID24 01-10 TCTCCGACTCAG-[MID]- ACACCAGTGTGGCCTTTT GGGTG (SEQ ID NO: 50-59) B-p20EA

CTAATACG ACTCCGAATTCCC (SEQ ID NO: 60) V: A/C/G, N: A/C/G/T, Adaptors Aand B are described in bold and bold italic, respectfully. The keysequence (TCAG) is underlined.The following MID Tag sequences were used for identifying a sample:MID1 (SEQ ID NO: 1325), MID2 (SEQ ID NO: 1326), MID3 (SEQ ID NO: 1327),MID4 (SEQ ID NO: 1328), MIDS (SEQ ID NO: 1329), MID6 (SEQ ID NO: 1330),MID7 (SEQ ID NO: 1331), MID8 (SEQ ID NO: 1332), MID10 (SEQ ID NO: 1334),MID11 (SEQ ID NO: 1335), MID15 (SEQ ID NO: 1339), MID16 (SEQ ID NO:1340), MID17 (SEQ ID NO: 1341), MID18 (SEQ ID NO: 1342), MID19 (SEQ IDNO: 1343), MID20 (SEQ ID NO: 1344), MID21 (SEQ ID NO: 1345), MID22 (SEQID NO: 1346), MID23 (SEQ ID NO: 1347), MID24 (SEQ ID NO: 1348)

Sequencing of Amplicon by Roche 454 Sequencing System

An amplicon for NGS was prepared by amplification of a 2^(nd) PCRproduct using a P20EA primer and a fusion tag primer (Table 4-1). Afusion tag primer comprising an adaptor A sequence(CCATCTCATCCCTGCGTGTCTCCGAC) (SEQ ID NO: 39), key of 4 base sequences(TCAG), molecule identification (MID) tag sequence (10 nucleotides), andTCR constant region specific sequence were designed in accordance withthe manufacturer's manual. After PCR amplification, an amplicon wasseparated and assessed by agarose gel electrophoresis. A resultingfragment (about 600 bp) was removed from the gel and purified by using aQIAEX II gel extraction kit (Qiagen). The amount of purified ampliconwas quantified by a Quant-iT™ PicoGreen® dsDNA Assay Kit (LifeTechnologies, Carlsbad, Calif.). Each amplicon obtained by using adifferent fusion tag primer from 10 healthy individuals was mixed atequimolar concentration. Emulsion PCR (emPCR) was performed inaccordance with the manufacturer's manual by using an amplicon mixturewith a GS Junior Titanium emPCR Lib-L kit (Roche 454 Life Sciences,Branford, Conn.).

Assignment of TRV and TRJ Segments

All read sequences were classified with a MID Tag sequence. Artificiallyadded sequences (tag, adaptor, and key) and sequences with a low qualityscore were removed from both ends of the read sequences by using asoftware installed on the 454 sequencing system. The remaining sequenceswere used in assignment of TRAV and TRAJ for a TCR.alpha. sequence andassignment of TRBV and TRBJ for a TCR.beta. sequence. The sequences wereassigned by sequencing using the highest identity in a data set ofreference sequences for 54 TRAV, 61 TRAJ, 65 TRBV and 14 TRBJ genesincluding pseudogenes and a data set of Open Reading Frame (ORF)reference sequences available from ImMunoGeneTics information System®(IMGT) database (http colon//www dot imgt dot org). Data processing,assignment, and data accumulation were automatically performed by usinga repertoire analysis software (Repertoire Genesis, RG) developedindependently by the inventors. RG executed BLATN, automaticaccumulation program, graphic program for use of TRV and TRJ, andprogram for sequence homology search using a CDR3 chain lengthdistribution. Sequence homology between a query sequence and entrysequence at a nucleotide level was automatically calculated. Parametersthat increased sensitivity and precision (E value threshold, minimumkernel, high score segment pair (HSP) score) were carefully optimizedfor each repertoire analysis.

Data Analysis

A nucleotide sequence of CDR3 in the range from the conserved cysteineat position 104 (Cys 104) (named by IMGT) to conserved phenylalanine atposition 118 (Phe118) and the subsequent glycine (Gly119) weretranslated into an estimated amino acid sequence. A unique sequence read(USR) was defined as a sequence read without identity in an estimatedamino acid sequence of CDR3 comprising TRV, TRJ and other sequencereads. The number of copies of the same USR was automatically counted bythe RG software in each sample, and ranking was then assigned in theorder of number of copies. The percentage of frequency of appearance ofsequence reads comprising TRAV, TRAJ, TRBV and TRBJ genes in allsequence reads was calculated.

Search for USR Shared Among Samples

In order to search for a sequence shared among samples, a characterstring of “TRV gene name”_“estimated amino acid sequence of CDR3region”_“TRJ gene name” (e.g., TRBV1_CASTRVVJFG_TRBJ2-5) of a USR of anindividual was used as a TCR identifier. A TCR identifier in a samplewas searched in a read data set from all other samples.

Diversity Index and Similarity Index

In order to estimate TCR diversity in a deep sequence data, severaldiversity indices, Simpson's indices and Shannon-Weaver's indices werecalculated by using the function “diversity” of a vegan package in the Rprogram. The indices were calculated based on the number of types persample and number of individuals per sample as the scale of ecologicalbiological diversity. USR and the number of copies were used for typesand individuals, respectively, in the deep sequence data. The Simpson'sindex (1−λ) was defined as the following:

$\begin{matrix}{{1 - \lambda} = {1 - {\sum\limits_{i = 1}^{S}\left( \frac{n_{i}\left( {n_{i} - 1} \right)}{N\left( {N - 1} \right)} \right)}}} & \left\lbrack {{Numeral}\mspace{14mu} 4\text{-}1} \right\rbrack\end{matrix}$

(wherein N is the total number of sequence reads, n_(i) is the number ofcopies of the ith USR, and S is the number of types of USR). The valueranges from 0-1, where the maximum number 1 means a high level ofdiversity and 0 means low diversity. The inverse Simpson's index (1/λ)was also calculated as the inverse of λ. The Shannon-Weaver's index (H′)was used as a diversity index and defined as follows:

$\begin{matrix}{H^{\prime} = {- {\sum\limits_{i = 1}^{S}{\frac{n_{i}}{N}\ln\frac{n_{i}}{N}}}}} & \left\lbrack {{Numeral}\mspace{14mu} 4\text{-}2} \right\rbrack\end{matrix}$

(wherein N is the total number of sequence reads, n_(i) is the number ofcopies of the ith USR, and S is the number of types of USR). Thesediversity indices should be biased due to the difference in the numberof reads among samples. Thus, the number of sequence reads wasstandardized for each sample to the minimum number of sequence reads(Venturi V et al. (2007) J Immunol Methods 321: 182-195). To standardizethe sample size, random sampling was repeated 1000 times withoutreplacement to calculate a diversity index by using an R program. Themedian value of the indices was used to determine the diversity indexfor a sample.

To estimate similarity of TCR repertoires among healthy individuals, aMorisita-Horn index (C_(H)) was defined as follows:

$\begin{matrix}{C_{H} = \frac{2{\sum_{i = 1}^{S}{x_{i}y_{i}}}}{\left( {\frac{\sum_{i = 1}^{S}x_{i}^{2}}{X^{2}} + \frac{\sum_{i = 1}^{S}y_{i}^{2}}{Y^{2}}} \right){XY}}} & \left\lbrack {{Numeral}\mspace{14mu} 4\text{-}3} \right\rbrack\end{matrix}$

(wherein x_(i) is the number of the ith USR in all X reads of a singlesample, y_(i) is the number of the ith USR in all Y reads in anothersample, S is the number of USRs). To standardize the sample size, randomsampling was repeated 1000 times without replacement to calculate asimilarity index by using an R program (Venturi V et al. (2008) JImmunol Methods 329: 67-80). The median value was used for a similarityindex between a pair of samples.

Statistics

Statistical significance was tested by a nonparametric Mann-Whitney Utest by using the GraphPad Prism software (version 4.0, San Diego,Calif.). A value of p<0.05 is considered statistically significant.

(Results)

Repertoire Analysis Software

The cloud-based software platform RG that was developed in the presentstudy is a high-speed, accurate and convenient computational system forTCR repertoire analysis. RG provides a consolidated software package for(1) assignment of V, D, and J segments, (2) calculation of sequenceidentity, (3) extraction of a CDR3 sequence, (4) counting of identicalreads, (5) amino acid translation, (6) frame analysis (stop and frameshift), and (7) analysis of CDR3 length. After uploading sequencing datafrom an NGS sequencer, V, D, and J segments can be identified based onsequence similarity thereof by using optimized parameters. The number ofreads is automatically aggregated, and subsequently processed data,tabulation chart, and graphs can be readily downloaded.

Number of Reads, Error Rates, and Nonproductive Reads

The present inventors performed high throughput sequencing on TCRα andTCRβ genes in PBMCs derived from 20 healthy individuals. A total of172,109 and 91,234 sequence reads were assigned to TCRα and TCRβrepertoire analysis, respectively, by using the RG program (Tables 4-2and 4-3).

TABLE 4-2 Table 4-2 Number of unique reads, nucleotides and readsobtained from PBMCs of 20 healthy individuals Total number Total TotalAverage number Healthy of unique number number of of nucleotidesindividuals reads of reads nucleotides per read H001 5,902 8,8053,732,329 423.9 H002 2,809 5,812 2,477,523 426.3 H003 1,707 7,3343,269,817 445.8 H004 5,586 6,981 3,047,583 436.6 H005 3,250 5,8152,507,968 431.3 H006 4,267 7,043 3,052,709 433.4 H007 5,467 6,4622,784,462 430.9 H008 3,350 6,206 2,700,726 435.2 H009 4,966 7,2673,119,902 429.3 H010 5,019 6,641 2,861,613 430.9 H011 8,327 16,2546,188,290 380.7 H012 2,289 7,203 2,386,452 331.3 H013 3,025 6,1182,185,135 357.2 H014 2,221 5,790 2,237,670 386.5 H015 3,976 7,0092,944,857 420.2 H016 8,152 20,493 8,147,320 397.6 H017 1,565 3,2441,042,278 321.3 H018 6,303 14,768 6,169,277 417.7 H019 10,001 17,7197,165,649 404.4 H020 3,052 5,145 2,096,401 407.5 Mean 4,561.7 8,605.53,505,898 407.4 SD 2,326.4 4,693.4 1,870,983  35.4 Total number 91,234172,109 70,117,961  PBMC, peripheral blood mononuclear cells; SD,standard deviation

TABLE 4-3 Table 4-3 Number of unique reads, nucleotides and readsobtained from PBMCs of 20 healthy individuals Total number Total TotalAverage number Healthy of unique number number of of nucleotidesindividuals reads of reads nucleotides per read H001 3,092 4,0071,626,917 406.0 H002 2,069 3,624 1,620,164 447.1 H003 979 3,6021,595,988 443.1 H004 3,025 3,664 1,637,322 446.9 H005 1,275 1,970884,000 448.7 H006 1,274 2,122 952,553 448.9 H007 3,301 3,760 1,665,584443.0 H008 2,089 3,956 1,737,410 439.2 H009 2,664 3,575 1,595,609 446.3H010 2,761 3,384 1,517,514 448.4 H011 5,198 8,182 3,369,499 411.8 H0124,882 11,759 4,421,616 376.0 H013 4,272 8,117 2,793,454 344.1 H014 2,1194,652 1,578,364 339.3 H015 3,735 5,298 1,929,452 364.2 H016 2,663 4,0861,582,494 387.3 H017 2,348 4,341 1,676,546 386.2 H018 3,044 4,8071,804,197 375.3 H019 4,317 5,923 2,285,404 385.9 H020 2,875 4,0991,639,809 400.1 Mean 2,899.1 4,746.4 1,895,695 409.4 SD 1,156.2 2,280.0806,581  37.8 Total number 57,982 94,928 37,913,896 PBMC, peripheralblood mononuclear cells; SD, standard deviation

Total of 94,928 and 57,982 unique sequence reads (USR) were identifiedin TCRα and TCRβ, respectively. The number of nucleotide sequences perread obtained by Roche 454 sequencing was a length of about 400 bp (meanbp length±SD, TCRα: 407.4±35.4, TCRβ: 409.4±37.8), showing that thesequences are of sufficient length to identify a TCR gene in the rangefrom V to J regions. To assess the precision and quality of NGSsequencing, the inventors have calculated the frequency of mismatchingnucleotides between a query sequence and a reference sequence as theerror rate. The error rate was 0.72±0.18% for TRAV, 0.54±0.08% for TRAJ,0.70±0.15% for TRBV, and 0.50±0.12% for TRBJ (Table 4-4).

TABLE 4-4 Table 4-4 Percentage of mismatching nucleotides in TCRsequence Healthy Mismatching nucleotide % individual TRAV TRAJ TRBV TRBJH001 0.54 0.40 0.58 0.54 H002 0.92 0.61 0.89 0.36 H003 0.93 0.63 0.830.36 H004 0.93 0.60 0.85 0.40 H005 0.87 0.61 0.90 0.37 H006 0.89 0.560.77 0.34 H007 0.89 0.59 0.80 0.46 H008 0.89 0.61 0.89 0.49 H009 0.930.63 0.91 0.39 H010 0.88 0.61 0.85 0.35 H011 0.54 0.48 0.54 0.52 H0120.58 0.65 0.61 0.69 H013 0.58 0.56 0.57 0.64 H014 0.68 0.48 0.54 0.59H015 0.65 0.40 0.57 0.70 H016 0.55 0.50 0.57 0.59 H017 0.57 0.53 0.570.57 H018 0.51 0.47 0.56 0.53 H019 0.53 0.44 0.63 0.66 H020 0.47 0.430.54 0.50 Mean 0.72 0.54 0.70 0.50 SD 0.18 0.08 0.15 0.12 SD. standarddeviation

The error rates were slightly lower than the mean error rate of 1.07%for the 454-sequence reported in a previous study (Gilles A et al.(2011) BMC Genomics 12: 245). The error rate was significantly higher ina V region than in a J region (AV vs. AJ: p<0.05, BV vs. BJ: p<0.0001).Higher sequence reliability was exhibited in a region closed to asequencing primer. The frequency of read frame shift (out-of-frame) in aCDR3 region or a nonproductive read having a stop codon was calculated(Table 4-5).

TABLE 4-5 Table 4-5 Frequency of out-of-frame unique sequence reads ofTCRα and TCRβ Healthy Frequency % individual TCRα TCRβ H001 21.9 27.0H002 35.8 26.1 H003 52.3 42.7 H004 27.8 20.7 H005 33.8 22.3 H006 31.819.8 H007 28.6 19.5 H008 31.0 27.4 H009 31.2 21.5 H010 30.6 19.1 H01129.1 27.5 H012 41.5 46.0 H013 33.4 38.1 H014 29.4 39.4 H015 21.1 34.3H016 34.3 32.1 H017 32.7 33.2 H018 27.9 29.2 H019 27.4 32.1 H020 21.728.3 Mean 31.2 29.3 SD  7.0  7.9 SD. standard deviation?

There was no significant difference in the percentage of frequency of anonproductive unique sequence read between TCRα and TCRβ (31.2±7.0% vs.29.3±7.9%, P=0.31).

Expression of TCR Gene Comprising ORF and Pseudogene

To determine the use of TRV and TRJ genes in a TCR sequencing read, thenumber of copies of USR having TRV or TRJ (number of reads) was eachcounted. Individual USRs were ranked in the order of number of copies.The frequency percentage of each of TRV and TRJ was calculated (Figuresand FIG. 52). For a TCRα repertoire, 8 pseudogenes (AV8-5, AV11, AV15,AV28, AV31, AV32, AV33 and AV37) were not expressed in healthyindividuals. AV8-7 classified as an ORF (defined based on a change inregulatory element, recombinant signal and/or splicing site by IMGT) washardly expressed (43 reads in 11 out of 20 individuals). Expression ofAV18 and AV36 (classified as a functional gene) was not observed inhealthy individuals. Furthermore, functional genes AV7 and AV9-1 werenot sufficiently expressed in one individual (9 reads) and 2 individuals(3 reads). Expression of AJ35 and AJ58 among 8 AJ genes classified asORF genes (AJ1, AJ2, AJ19, AJ25, AJ35, AJ58 and AJ61) was observed inall 20 individuals. AJ25 and AJ61 thereamong were expressed slightly in3 individuals (21 reads) and 7 individuals (35 reads), respectively.AJ1, AJ2, AJ19 and AJ59 were not present in any individual. Expressionof three pseudogenes AJ51, AJ55 and AJ60 was not present in anyindividual. Only 3 reads of the functional gene AJ14 were detected from3 individuals.

For a TCR gene, there was no expression of 11 pseudogenes (BV1, BV3-2,BV5-2, BV7-5, BV8-1, BV8-2, BV12-1, BV12-2, BV21-1, BV22-1, and BV26) inhealthy individuals. Among 5 ORF genes, BV5-7 (32 reads in 13individuals), BV6-7 (13 reads in 8 individuals), and BV17 (3 reads in 1individuals) were not sufficiently expressed. A BV7-1 ORF gene was notobserved in any individual, while BV23-1 was expressed in all 20individuals. For a BJ gene, there was no expression of a BJ2-2Ppseudogene.

Recombination of TRAV and TRAJ at a Low Frequency

Genetic recombination with 41 TRAV and 50 TRAJ (excluding pseudogenes,ORF and genes that are not sufficiently expressed) can generate a totalof 2050 AV-AJ recombinations (FIG. 53). Among them, 1969 AV-AJrecombinations (96.0%) were detected in 20 individuals. This indicatesthat almost all AV-AJ recombinations were used in TCR transcriptswithout limitation. In particular, AV1-1 to AV6 genes could not bepreferentially recombined with AJ50 to AJ58. Similarly, recombination ofAV35 to AV41 genes with AJ3 to AJ16 was hardly observed. Considering theposition of these gene segments on a chromosome, the results indicatethat an AV-AJ recombination hardly occurs between a proximal AV gene anda distal AJ gene and between a distal AV gene and a proximal AJ gene.

For TCRβ, 650 gene recombinations are generated by 50 BV (excluding 11pseudogenes and 5 ORF) and 13 BJ genes (excluding pseudogenes). 605BV-BJ (93.1%) thereamong were used in 30 individuals. There was no limitfor combination of TRBV with TRBJ.

Preferential Use of TRV and TRJ Repertoires in Healthy Individuals

To elucidate the use of TRV and TRJ in all TRC transcripts, thefrequency of USR having TRV or TRJ was each calculated (FIG. 51 and FIG.52). Preferential use in some TRAV genes was similar to previous results(6) obtained by using a quantitative assay based on hybridization. SomeTRBV genes were used more in a TRBV repertoire. The top 3, TRAV9-2(BV4S1 by Arden), TRBV20-1 (BV2S1) and TRBV28 (BV3S1), accounted for 1/3of all sequence reads. This was similar to the result (6) obtained in aprevious study by the inventors using a microplate hybridization assay.Use of a gene significantly varied among TRBJ genes. TRBJ2-1 and TRBJ2-7were very highly expressed, while expression of TRBJ1-3, TRBJ1-4,TRBJ1-6, TRBJ2-4 and TRBJ2-6 was low.

3 Dimensional (3D) View of the Use of TCR Repertoire

To visualize the use of TCRs having a combination of a TRV gene and aTRJ gene, the inventors made a 3D portrayal of a TCR repertoire (FIG. 54and FIG. 55). The advantage of a 3D image is that the level of diversityof TCRs and predominance of a specific combination of a TRV gene and aTRJ gene can be readily observed. For TCRβ, there was hardly anypreferential use of recombination between a TRVB gene and a TRBJ gene.The frequency of each recombination was dependent on the use of TRBV orTRBJ. BV29-1/BJ2-7, BV29-1/BJ2-1, BV29-1/BJ2-3 and BV20-1/BJ2-7 wereused at a high frequency in all combinations, while others wereexpressed at a low frequency. In contrast, 3D imaging of a TCRαrepertoire showed expression of TRAV and TRAJ at a low level in a widedistribution. The share was less than 1% in all combinations. Notably,TCR reads having AV1-2 and AJ33 were highly expressed in all healthyindividuals (mean±SD: 0.99±0.85).

Digital CDR3 Chain Length Distribution

CDR3 chain length distribution analysis called CDR3 size spectratyping(Yassai M et al. (2000) J Immunol 165: 3706-3712; Yassai M et al. (2002)J Immunol 168: 3801-3807) or immunoscope analysis (Pannetier C et al.(1993) Proc Natl Acad Sci USA 90: 4319-4323; Pannetier C et al. (1995)Immunol Today 16: 176-181) was efficiently used to estimate thediversity of a TCR repertoire. The technique is based on actual peakdistribution of PCR amplicons comprising a CDR3 sequence with gelelectrophoresis. In the present study, a determined nucleotide sequencelength of a TCR in the range from the conserved Cys 104 (named by IMGT)to conserved phenylalanine at position 118 (Phe118) was automaticallycalculated. This provides a visibly simple method for estimating thediversity and clonality of a TCR by using NGS data. RG can generate adiagram representing a digital CDR3 chain length distribution for each Vregion. The CDR3 chain length distribution of both TCRα and TCRβ wassimilar to a common distribution, but was not necessarily completelysymmetric (FIG. 56). The CDR3 chain length is shorter in TCRα than inTCR (mean±SD: 41.2±8.3 vs. 42.8±6.1). TCRα has a positive skewnessrelative to TCR (skewness index: 11.1 vs 5.41), indicating that thedistribution in TCRα was concentrated on the left side. Furthermore,TCRα had a positive kurtosis relative to TCRβ, indicating high kurtosisin TCRα (kurtosis index: 282.4 to 176.7).

Diversity of TCRα and TCRβ Repertoires

To show the diversity of a TCR repertoire, the inventors calculated thediversity index (Simpson's index, Shannon-Weaver's index or the like)and the average number of copies of USRs (FIG. 57). The average numberof copies of USRs significantly differed between TCRα and TCRβ (2.0±0.72to 1.70±0.57). Furthermore, there was no significant difference in theinverse Simpson's index (D) or Shannon-Weaver's index (H) between TCRαand TCRβ (D: 710.3±433.0 to 729.7±493.9, H: 7.02±0.33 vs 6.97±0.43). Theresults show that there is no difference in immunodiversity for TCRα andTCRβ in healthy individuals.

Similarity of TCRα and TCRβ repertoires among healthy individuals

To elucidate the correlation of use of genes among individuals, thefrequency percentages of each of TRV and TRJ were plotted for allindividual pairs by a scatter plot (FIG. 60). The Spearman's correlationcoefficient between each pair was calculated. A matching correlationcoefficient was lower in TRAV than in TRBV (mean±SD, 0.86±0.059 forTRAV, 0.89±0.038 for TRBV, p<0.001), and lower in TRAJ than in TRBJ(0.74±0.095 for TRAJ, 0.91±0.063 for TRBJ, p<0.001). The results showthat the expression level of TRV and TRJ among healthy individuals wasmore similar among individuals in TCR relative to TCRα.

To assess the potential similarity of TCR repertoires at a clone levelamong healthy individuals, the inventors searched for a TCR sequenceread shared among individuals. The number of TCR reads shared wascounted for all pairs of individuals to calculate the frequency thereof(Table 4-6 and Table 4-7).

TABLE 4-6 Percentage of frequency of TCRα reads shared among all pairsof healthy individuals H001 H002 H003 H004 H005 H006 H007 H008 H009 H010H011 H012 H013 H014 H015 H016 H017 H018 H019 H020 H001 — 0.71 0.29 1.041.29 0.80 1.61 0.78 0.89 0.84 1.12 0.87 0.73 0.77 1.28 0.92 0.77 1.051.13 1.11 H002 0.34 — 0.47 0.59 0.37 0.37 0.51 0.27 0.72 0.36 0.23 0.310.26 0.23 0.25 0.33 0.32 0.35 0.26 0.46 H003 0.08 0.28 — 0.21 0.15 0.230.04 0.21 0.16 0.12 0.08 0.13 0.10 0.00 0.15 0.07 0.19 0.10 0.10 0.13H004 0.98 1.17 0.70 — 1.05 1.05 1.32 1.25 1.41 1.24 0.82 1.00 0.79 0.681.26 0.72 0.83 0.78 0.78 0.79 H005 0.71 0.43 0.29 0.61 — 0.89 0.90 0.840.79 0.74 0.59 0.66 0.50 0.45 0.80 0.58 0.89 0.71 0.53 0.69 H006 0.580.57 0.59 0.81 1.17 — 1.01 0.63 1.07 0.92 0.82 1.05 0.63 0.54 0.93 0.750.89 0.75 0.57 0.75 H007 1.49 1.00 0.12 1.29 1.51 1.29 — 1.13 1.31 1.280.94 1.00 0.69 1.04 1.58 0.88 1.28 1.14 1.05 1.05 H008 0.44 0.32 0.410.75 0.86 0.49 0.70 — 0.58 0.66 0.36 0.48 0.26 0.36 0.68 0.47 0.32 0.410.38 0.43 H009 0.75 1.28 0.47 1.25 1.20 1.24 1.19 0.87 — 1.18 0.77 0.8570.63 0.63 1.18 0.80 0.89 0.81 0.60 0.92 H010 0.71 0.64 0.35 1.11 1.141.08 1.17 0.99 1.19 — 0.90 0.96 0.79 0.59 1.16 0.74 0.64 0.95 0.86 0.85H011 1.58 0.68 0.41 1.25 1.51 1.59 1.43 0.90 1.29 1.49 — 1.66 6.02 0.992.04 1.37 1.21 1.52 1.49 1.38 H012 0.34 0.25 0.18 0.41 0.46 0.56 0.420.33 0.40 0.44 0.46 — 0.36 0.09 0.58 0.44 0.58 0.48 0.42 0.36 H013 0.370.28 0.18 0.43 0.46 0.45 0.38 0.24 0.38 0.48 2.19 0.48 — 0.41 0.58 0.330.38 0.38 0.46 0.56 H014 0.29 0.18 0.00 0.27 0.31 0.28 0.42 0.24 0.280.26 0.26 0.09 0.30 — 0.50 0.3 0.26 0.17 0.35 0.29 H015 0.86 0.36 0.350.90 0.98 0.87 1.15 0.81 0.95 0.92 0.97 1.00 0.76 0.90 — 0.82 0.89 0.840.95 0.85 H016 1.27 0.96 0.35 1.06 1.45 1.43 1.32 1.13 1.31 1.20 1.351.57 0.89 1.22 1.69 — 0.83 1.19 1.20 0.66 H017 0.20 0.18 0.18 0.23 0.430.33 0.37 0.15 0.28 0.20 0.23 0.39 0.20 0.18 0.35 0.16 — 0.17 0.20 0.26H018 1.12 0.78 0.35 0.88 1.38 1.10 1.32 0.78 1.03 1.20 1.15 1.31 0.790.50 1.33 0.92 0.70 — 1.18 0.92 H019 1.91 0.93 0.59 1.40 1.63 1.34 1.921.13 1.21 1.71 1.79 1.83 1.52 1.58 2.39 1.47 1.28 1.87 — 1.77 H020 0.580.50 0.23 0.43 0.65 0.54 0.59 0.39 0.56 0.52 0.50 0.48 0.56 0.41 0.650.258 0.51 0.44 0.54 —

TABLE 4-7 Percentage of frequency of TCRβ reads shared among all pairsof healthy individuals H001 H002 H003 H004 H005 H006 H007 H008 H009 H010H011 H012 H013 H014 H015 H016 H017 H018 H019 H020 H001 — 0.05 0.00 0.000.08 0.00 0.03 0.05 0.11 0.07 0.08 0.02 0.05 0.00 0.00 0.11 0.00 0.000.12 0.07 H002 0.03 — 0.00 0.03 0.08 0.00 0.00 0.05 0.04 0.00 0.02 0.020.00 0.00 0.00 0.04 0.04 0.00 0.02 0.00 H003 0.00 0.00 — 0.00 0.00 0.000.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00H004 0.00 0.05 0.00 — 0.16 0.00 0.03 0.05 0.08 0.07 0.00 0.02 0.00 0.050.00 0.08 0.04 0.03 0.02 0.00 H005 0.03 0.05 0.00 0.07 — 0.00 0.00 0.000.04 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.04 0.00 0.00 0.03 H006 0.000.00 0.00 0.00 0.00 — 0.03 0.00 0.04 0.04 0.02 0.02 0.00 0.05 0.03 0.000.04 0.00 0.02 0.00 H007 0.03 0.00 0.00 0.03 0.00 0.08 — 0.05 0.04 0.040.02 0.02 0.00 0.00 0.11 0.04 0.00 0.00 0.00 0.03 H008 0.03 0.05 0.000.03 0.00 0.00 0.03 — 0.00 0.04 0.02 0.00 0.00 0.00 0.03 0.11 0.00 0.000.02 0.00 H009 0.10 0.05 0.00 0.07 0.08 0.08 0.03 0.00 — 0.11 0.06 0.020.00 0.00 0.03 0.04 0.00 0.03 0.02 0.00 H010 0.06 0.00 0.00 0.07 0.000.08 0.03 0.05 0.11 — 0.00 0.02 0.00 0.9 0.03 0.11 0.00 0.00 0.07 0.07H011 0.13 0.05 0.00 0.00 0.00 0.08 0.03 0.05 0.11 0.00 — 0.04 0.16 0.000.08 0.11 0.00 0.16 0.05 0.14 H012 0.03 0.05 0.00 0.03 0.00 0.08 0.030.00 0.04 0.04 0.04 — 0.30 0.14 0.21 0.45 0.26 0.03 0.021 0.00 H013 0.060.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.27 — 0.14 0.16 0.190.13 0.03 0.02 0.03 H014 0.00 0.00 0.00 0.03 0.00 0.08 0.00 0.00 0.000.07 0.00 0.06 0.07 — 0.00 0.04 0.21 0.00 0.00 0.03 H015 0.00 0.00 0.000.00 0.08 0.31 0.00 0.05 0.04 0.04 0.06 0.16 0.14 0.00 — 0.15 0.09 0.030.02 0.00 H016 0.10 0.05 0.00 0.07 0.08 0.00 0.03 0.14 0.04 0.11 0.060.25 0.12 0.05 0.11 — 0.13 0.03 0.02 0.03 H017 0.00 0.05 0.00 0.03 0.080.08 0.00 0.00 0.00 0.00 0.00 0.12 0.07 0.24 0.05 0.11 — 0.00 0.00 0.00H018 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.4 0.00 0.10 0.02 0.020.00 0.03 0.04 0.00 — 0.14 0.10 H019 0.16 0.05 0.10 0.03 0.00 0.08 0.000.05 0.4 0.11 0.04 0.02 0.02 0.00 0.03 0.04 0.00 0.20 — 0.10 H020 0.060.05 0.00 0.00 0.08 0.00 0.03 0.00 0.00 0.07 0.08 0.00 0.02 0.05 0.000.04 0.00 0.10 0.07 —

The mean frequency was significantly higher in TCRα relative to TCRβ,(0.76±0.52 vs 0.040±0.057, n=380, p<0.001) (FIG. 58), indicating that aTCRα repertoire comprises more shared TCR reads among individuals thanTCRβ. The Morisita-Horn index, which is a similar index, wassignificantly larger for TCRα than for TCRβ (0.0058±0.0069 vs0.000096±0.00029, n=190, P<0.001). The results clearly show that a TCRαrepertoire was more similar among healthy individuals relative to a TCRβrepertoire.

TCR Sequence Shared Among Healthy Individuals

A small number of TCR sequences are shared among different healthyindividuals. The shared TCR thereof are called public TCRs. In contrast,most TCRs were specific to each healthy individual (private TCR). Toidentify a public TCR sequence in 20 healthy individuals, the inventorssearched for a TCRα read and a TCRβ read shared among two or morehealthy individuals. 3,041 public TCRα and 206 public TCRβ sequenceswere obtained from 90,643 and 57,982 USRs, respectively, in 20 healthyindividuals (Table 4-8).

TABLE 4-8 Table 4-8 Number of TCRα and TCRβ sequences shared amongmultiple healthy individuals Number of shared TCRs Number of individualsTCRα TCRβ  2 2,390 196  3 424 9  4 125 1  5 47 0  6 23 0  7 9 0  8 4 0 9 5 0 10 5 0 11 2 0 12 0 0 13 3 0 14 1 0 15 1 0 16 2 0 17 0 0 18 0 0 190 0 20 0 0 Total number 3,041 206 The number of identical TCR sequencesobserved in a plurality of healthy individuals (2-20 individuals) wascounted.

Public TCRα was higher in frequency than TCR in peripheral bloodlymphocytes (PBL) derived from a healthy individual. A public TCRsequence was obtained from 2-4 individuals, while a public TCRα sequencewas observed in 16 individuals. The results shows that a TCRα publicsequence is used more commonly in individuals, but a TCR repertoire wasmore specific to each individual. Furthermore, the frequency perindividual of a TCR sequence shared between a pair of individuals wassignificantly higher for TCRα (7.9%) than for TCR (0.7%). Tocharacterize a public TCRα sequence, the inventors compared the lengthof CDR3 between public and private TCRα sequences and observed thatpublic TCRα had CDR3 with a shorter chain length than private TCRα(median value: 39 vs 42) (FIG. 59).

A TCR shared by a plurality of individuals comprises an invariant TCRαchain at a high frequency.

Public TCRα was observed at a high frequency in PBLs derived from ahealthy individual. To determine the origin of public TCRα, theinventors examined the CDR3 sequence of public TCRα that was previousreported. Interestingly, a public TCRα sequence shared by a plurality ofindividuals comprised invariant TCRα at a high percentage, indicating aniNKT cell or MAIT cell (Table 4-9).

TABLE 4-9 Invariant TCR observed in public TCRα sequence Sharing Indi-Germ Invar- vid- line iant ual^(a) TRAV TRAJ CDR3^(b) like^(c) TCR^(d)16 TRAV1-2 TRAJ33 CAVR (SEQ yes MAIT DSNY ID QLIW NO: 1938) 16 TRAV1-2TRAJ33 CAVM (SEQ MAIT DSNY ID QLIW NO: 1939) 15 TRAV1-2 TRAJ33 CAVL (SEQMAIT DSNY ID QLIW NO: 1940) 14 TRAV1-2 TRAJ12 CAVM (SEQ yes DSSY ID KLIFNO: 1941) 13 TRAV1-2 TRAJ33 CAVT (SEQ MAIT DSNY ID QLIW NO: 1942) 13TRAV1-2 TRAJ20 CAVR (SEQ yes DGDY ID KLSF NO: 1943) 13 TRAV1-2 TRAJ33CAVK (SEQ MAIT DSNY ID QL1W NO: 1944) 11 TRAV1-2 TRAJ33 CAAM (SEQ MAITDSNY ID QLIW NO: 1945) 11 TRAV1-2 TRAJ33 CAAL (SEQ MAIT DSNY ID QLIW NO:1946) 10 TRAV9-2 TRAJ20 CALN (SEQ yes DYKL ID SF NO: 1947) 10 TRAV1-2TRAJ33 CAVV (SEQ MAIT DSNY ID QLIW NO: 1948) 10 TRAV10 TRAJ18 CVVS (SEQyes iNKT DRGS ID TLGR NO: LYF 1949) 10 TRAV1-2 TRAJ33 CAVI (SEQ MAITDSNY ID QLIW NO: 1950) 10 TRAV13-2 TRAJ9 CAEN (SEQ yes TGGF ID KTIF NO:1951) 9 TRAV1-2 TRAJ33 CAVS (SEQ MAIT DSNY ID QLIW NO: 1952) 9 TRAV9-2TRAJ53 CALS (SEQ yes GGSN ID YKLT NO: F 1953) 9 TRAV2 TRAJ36 CAVE (SEQDQTG ID ANNL NO: FF 1954) 9 TRAV9-2 TRAJ45 CALS (SEQ DSGG ID GADG NO:LTF 1955) 9 TRAV1-2 TRAJ20 CAVR (SEQ DRDY ID KLSF NO: 1956) 8 TRAV1-2TRAJ33 CAGM (SEQ MAIT DSNY ID QLIW NO: 1957) 8 TRAV21 TRAJ 20 CAVN (SEQyes DYKL ID SF NO: 1958) 8 TRAV1-2 TRAJ33 CAPM (SEQ MAIT DSNY ID QLIWNO: 1959) 8 TRAV1-2 TRAJ33 CASM (SEQ MAIT DSNY ID QL1W NO:I960) 7TRAV12-2 TRAJ30 CAVN (SEQ yes RDDK ID IIF NO:1961) 7 TRAV13-2 TRAJ53CAEN (SEQ yes SGGS ID NYKL NO: TF 1962) 7 TRAV1-2 TRAJ33 CAPL (SEQ MAITDSNY ID QLIW NO: 1963) 7 TRAV9-2 TRAJ53 CALN (SEQ yes SGGS ID NYKL NO:TF 1964) 7 TRAV12-1 TRAJ20 CVVN (SEQ yes DYKL ID SF NO: 1965) 7 TRAV9-2TRAJ20 CALS (SEQ yes SNDY ID KLSF NO: 1966) 7 TRAV13-1 TRAJ15 CAAS (SEQNQAG ID TALI NO: F 1967) 7 TRAV12-1 TRAJ49 CVVN (SEQ yes TGNQ ID FYF NO:1968) 7 TRAV12-1 TRAJ27 CVVN (SEQ yes TNAG ID KSTF NO: 1969) 6 TRAV2TRAJ9 CAVE (SEQ DTGG ID FKTI NO: F 1970) 6 TRAV1-2 TRAJ33 CAVE (SEQ MAITDSNY ID QLIW NO: 1971) 6 TRAV21 TRAJ26 CAVD (SEQ yes NYGQ ID NFVF NO:1972) 6 TRAV9-2 TRAJ53 CALS (SEQ DSGG ID SNYK NO: LTF 1973) 6 TRAV21TRAJ12 CAVM (SEQ yes DSSY ID KLIF NO: 1974) 6 TRAV2 TRAJ9 CAVN (SEQ yesTGGF ID KTIF NO: 1975) 6 TRAV1-2 TRAJ33 CAVR (SEQ MAIT DGNY ID QLIW NO:1976) 6 TRAV9-2 TRAJ8 CALN (SEQ yes TGFQ ID KLVF NO: 1977) 6 TRAV13-2TRAJ44 CAEN (SEQ yes TGTA ID SKLT NO: F 1978) 6 TRAV1-2 TRAJ33 CAAT (SEQMAIT DSNY ID QLIW NO: 1979) 6 TRAV12-2 TRAJ15 CAVN (SEQ yes QAGT ID ALIFNO: 1980) 6 TRAV13-2 TRAJ42 CAEN (SEQ yes YGGS ID QGNL NO: IF 1981) 6TRAV21 TRAJ30 CAVL (SEQ NRDD ID KIIF NO: 1982) 6 TRAV2 TRAJ26 CAVE (SEQyes DNYG ID QNFV NO: F 1983) 6 TRAVI2-2 TRAJ20 CAVN (SEQ yes DYKL ID SFNO: 1984) 6 TRAV12-1 TRAJ31 CVVN (SEQ yes NARL ID MF NO: 1985) 6 TRAV2TRAJ26 CAVD (SEQ yes NYGQ ID NFVF NO: 1986) 6 TRAV2 TRAJ3 CAVD (SEQ SSASID KIIF NO: 1987) 6 TRAV9-2 TRAJ23 CALI (SEQ yes YNQG ID GKLI NO: F1988) 6 TRAV9-2 TRAJ9 CALN (SEQ yes TGGF ID KTIF NO: 1989) 6 TRAV13-2TRAJ39 CALN (SEQ yes TGGF ID KTIF NO: 1990) 6 TRAV1-2 TRAJ12 CAVL (SEQDSSY ID KLIF NO: 1991) 6 TRAV1-2 TRAJ12 CAAM (SEQ DSSY ID KLIF NO: 1992)Non-germ line amino acid sequence was underlined; c CDR3 sequencewithout a non-germ line sequence is indicated by “yes”; d MAIT:Mucosal-associated invariant T cell, iNKT: invariant natural killer Tcell

It is reported that MAIT cells express TRAV1-2 and TRAJ33, while iNKTsexpress TRAV10 and TRAJ18. Many public TCRα used TRAV1-2 and TRAJ33comprising a different CDR3 sequence. The total frequency percentages ofMAIT TCRα having TRAV1-2 and TRAJ33 and iNKT TCRα having TRAV10 andTRAJ18 was 0.82±0.72% and 0.15±0.41% per individual, respectively. Among55 public TCRα sequences, 17 (31%) MAIT and 1 (1.8%) iNKT sequences wereobserved in 6 or more individuals (FIG. 53). The percentage increasedwith the number of overlapping individuals. A germ line-like CDR3sequence without an amino acid sequence modified from a germ linesequence was observed in 27 out of 38 public TCRα (71%) excluding MAIT(TRAV1-2-TRAJ33) and NKT (TRAV10-TRAJ18).

(Discussion)

A high throughput sequencing technique has made a great leap by thedevelopment of a wide variety of NGS platforms. Although NGS promotesacquisition of an enormous amount of sequence data, it still requiresPCR amplification or gene enrichment of a sequence gene of interestinstead of the entire genome or gene library. Multiplex PCR with manygene specific primers have been widely used for non-homogeneous TCR orBCR genes generated by rearrangement of many gene segments. However,amplification bias between each gene occurs from using a plurality ofprimers to prevent accurate estimation of gene frequency. In thisregard, the inventors used adaptor-ligation mediated PCR, which is anunbiased PCR technique, for TCR repertoire analysis based on NGS. Thismethod uses a single set of primers and theoretically enablesamplification of all TCR genes without applying PCR bias. Thus, thismethod is optimal for accurately estimating the amount of each TCR genepresent from a wide range of samples.

The inventors comprehensively investigated TCRα and TCRβ repertoiresderived from many individuals (n=20) at a clone level to assess a largeamount of sequence data (total of 149216 unique sequence reads from267037 sequence reads). Thus, the present study elucidated the level ofdiversity and similarity of a TCR repertoire in a healthy individual andgene use in a normal range. Compared to an Illumina NGS platform(Freeman J D et al. (2009) Genome Res 19: 1817-1824; Warren R L et al.(2011) Genome Res 21: 790-797; Robins H S et al. (2009) Blood 114:4099-4107), there are not as many sample sequence reads, but the readsare longer and higher quality. The depth of different sequences in aCDR3 contig generated from many shotgun reads by using an Illuminaplatform can make it difficult to determine a frequency of a TCR clonetype. However, it had a long sequence (mean of about 400 bp, Table 4-2and Table 4-3) that covers all regions of CDR3, V and J and determinesall TCR sequences from a single read. Direct analysis from a readsequence that does not use conjugation is highly likely to reflect theactual frequency of a TCR clone type accurately. The error rate in a TCRsequence was slightly lower than a previous report exhibiting a meanerror rate of 1.07% for a 454-sequence and a high level of precision andquality was exhibited regardless of nested PCR. Furthermore, RG, theassignment and aggregation software, can quickly aggregate the use ofTRV and TRJ and use of recombinations. The integrated analysisfacilitates the detection of preferential use of a predetermined TRVand/or TRJ and is thus useful for researching an immune response by anantigen-specific T cell.

Unlike the widely-used multiplex PCR that typically requirescompensation for PCR bias (Carlson C S et al. (2013) Nat Commun 4:2680), AL-PCR accurately estimates a TCR repertoire without bias. A highlevel of expression of TRBV18 (BV18S1, named by Arden), TRBV19 (BV17S1)and TRBV7-9 (BV6S5) and a low level of expression of TRBV20-1 (BV2S1),TRBV28 (BV3S1), and TRBV29-1 (BV4S1) are reported by multiplex PCR inCD4⁺ and CD8⁺ cells (Emerson R et al. (2013) J Immunol Methods 391:14-21). However, flow cytometry analysis showed that a large amount ofTRBV20 and TRBV29 was expressed in PBL (van den Beemd R et al. (2000)Cytometry 40: 336-345; Pilch H et al. (2002) Clin Diagn LabImmunol 9:257-266; Tzifi F (2013) BMC Immunol 14: 33). The results by theresearchers for a TCR repertoire are similar to previous reports (Li Set al. (2013) Nat Commun 4: 2333). Thus, this method provides a direct,accurate, and reliable TCR repertoire result.

Use of recombination showed recombination of AJ-proximal 3′ AV segmentwith AV-distal 3′ AJ segment at a low frequency and recombination ofAJ-proximal 5′ AV segment with AV-distal 5′ AJ segment at a lowfrequency. In gene rearrangement of a TCRαδ gene locus, activation ofTCRα enhancer (Eα) and T early alpha (TEA) promoter initiates the firstrearrangement of proximal TRAV and TRAJ segments. Subsequent secondrearrangement occurs by using 5′ proximal TRAV and distal 3′ TRAJ genes(Huang C et al. (2001) J Immunol 166: 2597-2601; Krangel M S et al.(2004) Immunol Rev 200:224-232; Pasqual N et al. (2002) J Exp Med 196:1163-1173; Aude-Garcia C et al. (2001) Immunogenetics 52: 224-230),resulting in restricted use of a TCRα repertoire (continuousbidirectional recombination model) (Chaumeil J et al. (2012) Embo J 31:1627-1629). However, all TRAV genes can be recombined with a TRAJ genein the second rearrangement by a gene locus contraction and DNA loopformation model (Genolet R et al. (2012) Embo J 31: 4247-4248). Therewas inefficient distal-proximal and proximal-distal recombination ofTRAV-TRAJ genes, but the use of TRAJ was not limited across all TRAV andwas rather equally distributed. This indicates that recombinationfrequency varies depending on the position of TRAV and is likelydependent on the loop forming ability between a TRAV gene locus and TRAJgene locus.

Potential TCR diversity generated by addition/deletion of a nucleotideand recombination was estimated to be at most 10¹⁵ (Davis M M et al.(1988) Nature 334: 395-402). The diversity of TCR was estimated to beabout 3-4×10⁶ (Robins H S et al. (2009) Blood 114: 4099-4107) or about1×10⁶ in humans (Warren R L et al. (2011) Genome Res 21: 790-797) basedon NGS. Furthermore, the diversity of TCRα is 50% of TCR in humans(Arstila T P et al. (1999) Science 286: 958-961). For mice, TCRαdiversity is 0.79×10⁴ (Pasqual N et al. (2002) J Exp Med 196: 1163-1173)or 1.18×10⁴ (Cabaniols J P et al. (2001) J Exp Med 194: 1385-1390),indicating that it is 10 times lower than the TCRβ diversity. The lowdiversity of TCRα may be due to a difference in the recombinationprocess between TCRα and TCRβ. However, the results of the inventorsindicated that the level of diversity between TCRα and TCRβ is similaras assessed by the Simpson's index and Shannon-Weaver's index.Similarly, Wang et al have reported that TCR diversity was estimated tobe equal between TCRα and TCRβ (0.47×10⁶ vs 0.35 x×10⁶) (Wang C et al.(2010) Proc Natl Acad Sci USA 107: 1518-1523; Dash P et al. (2011) JClin Invest 121: 288-295). It is shown that in contrast to the previousreport obtained by using a limited number of sequences, for large scalesequencing, the repertoire size for TCRα generated by V-J recombinationis comparable to the repertoire size for TCRβ by V-D-J recombination.

Surprisingly, the inventors have found that TCRα repertoires are similaramong individuals. This is mainly due to the presence of a TCR sequenceshared among 2 or more individuals (public TCR). An addition anddeletion of a random nucleotide mediated by terminal deoxynucleotidyltransferase occurs during TCR rearrangement, resulting in significantincrease in diversity of a CDR3 region. However, a public TCR appears tohave a germ line like CDR3 sequence that does not undergo such analteration (Table 4-9). Furthermore, a public TCR comprises many TCRclone types having CDR3 with a shorter chain length. The resultsindicate that high frequency of public TCRα occurs possibly due to adifference in the intrinsic recombination mechanism from TCRβ (V-J vsV-D-J).

It is notable that public TCRα is present in many individuals. Theinventors have unexpectedly found that public TCRα comprises invariantTCRα derived from MAIT cells or iNKT cells at a high ratio. Thesefunctionally important T cells have homogeneous TCRα and diverse TCRβ.MAIT cells express classical TCRα including TRAV1-2 (Vα7.2)-TRAJ33(Jα33) and are preferentially located in the intestinal lamina propria(Tilloy F et al. (1999) J Exp Med 189: 1907-1921; Treiner E et al.(2003) Nature 422: 164-169). MAIT cells recognize vitamin B2 metabolitespresented by a nonclassical MHC class I molecule, MR1. Furthermore,CD1d-restricted iNKT cells express an invariant TRAV10 (Vα24)-TRAJ18(Jα18) chain and semi-invariant TRBV25-1 (Vβ11) (Godfrey D I et al.(2004) Nat Rev Immunol 4: 231-237) and recognize glycolipids such asα-galactosylceramide, self-glycolipid, or isoglobo-trihexosylceramide(Tupin E et al. (2007) Nat Rev Microbiol 5: 405-417). Both cell typesplay an important role in regulating immune responses to infection,tumor, autoimmune disease and tolerance induction (Godfrey D I et al.(2004) J Clin Invest 114: 1379-1388). The frequency of MAIT cells andiNKT cells obtained in this study is consistent with previous reports(showing that MAIT cells expanded to 1-4% of peripheral blood T cells(Martin E et al. (2009) PLoS Biol 7: e54) and iNKT cells accounted for0.2% of the entire PBMCs (Lee P T et al. (2002) J Clin Invest 110:793-800)). Interestingly, there are different types of public sequenceshaving TRAV1-2 (e.g., TRAV1-2-TRAJ12, TRAV1-2-TRAJ20) and some publicTCRα sequences other than MAIT and iNKT sequences that are well known.Thus, repertoire analysis based on NGS is useful in both estimating thefrequency of MAIT cells or iNKT cells and identifying a potentially newinvariant TCRα chain. Further identification and substantiation isrequired to identify potentially new invariant TCRα.

As discussed above, the inventors have developed a novel TCR repertoireanalysis method based on NGS to find the similarity among differentindividuals between TCRα and TCRβ and comparable diversity therewithfrom the present Example. A public TCRα sequence comprises functionallysignificant T cell subpopulation, MAIT and iNKT cells at a highfrequency. In addition, an approach to find a public TCR by NGS isuseful in identifying a potentially new invariant TCRα chain. Thistechnique with very high precision for TCR repertoire analysis wasdemonstrated to reveal antigen specific T cells associated with onset ofa human disease and contribute to research, diagnosis and therapy ofnatural and acquired immunity.

Applied Example 1: Example of Antibody Isolation: Example of Isolationof Human Form Antibody Utilizing BCR Repertoire Analysis

In this Example, an example of isolating a human form antibody utilizingBCR repertoire analysis is provided as a specific embodiment in actualapplication.

(Where a Reagent and the Like is Obtained)

Obtaining human form anti-idiotype antibody using humanized NOG mice

1. A monoclonal BCR derived from a tumor cell is observed to be highlyexpressed in a B cell based leukemia or malignant lymphoma patient.2. A peripheral blood mononuclear cell is collected from the B cellbased leukemia or malignant lymphoma patient to carry out the BCRrepertoire analysis described in this section. An immunoglobulin H chaingene derived from a tumor cell is identified, which has the highestranking and is significantly present from determined genetic sequencesof several tens of thousands of reads.3. The determined immunoglobulin H chain genetic sequence is used toestimate an amino acid sequence of a CDR3 region that is highly diverse,and a peptide that is identical with the sequence is synthesized.4. 200 μg of synthetic peptide is mixed well with a complete Freund'sadjuvant (CFA, Sigma Aldrich) and subcutaneously administered in ahumanized NOG mouse with a syringe (first immunization). Similarly, PBSis administered to a control mouse. Furthermore, the same amount ofantigen peptide is readministered after 2 weeks from the firstimmunization.5. The lymph node or spleen is extracted from the mouse after 4 weeksfrom the first immunization. Tissue is thinly cut in a phosphatebuffered saline (PBS, Invitrogen) and filtered with a cell strainer(0.75 μm, BD) to prepare a single cell.6. The resulting cell is dissolved in a Trizol solution (Invitrogen).The genetic sequence is determined by the BCR repertoire analysis methoddescribed herein.7. The resulting BCR genetic sequences of several tens of thousands ofreads are sorted in the order of the frequency of presence (number ofreads) to determine immunoglobulin H chain and L chain genetic sequenceshaving a high ranking. Immunoglobulin H chain and L chain geneticsequences with a significantly high frequency of presence, relative toread ranking of the mouse administered with PBS as a control, areselected.8. For the resulting immunoglobulin H chain and L chain geneticsequences, a P20EA adaptor primer and a C terminal primer are used forPCR amplification of full length immunoglobulin H chain and full lengthL chain genes. Each full length gene is inserted into a multi-cloningsite in antibody expression vectors pEHX1.1 (for antibody H chain,TOYOBO) and pELX2.2 (for antibody L chain, TOYOBO) with a ligationreaction by using a Ligation Kit (TAKARA). An E. coli TOP10 cell line(One Shot′ TOP10 Chemically Competent E. coli, Invitrogen) istransformed to obtain an H chain expression plasmid and L chainexpression plasmid.9. Both plasmids are digested twice with BglII and EcoRI restrictionenzymes. A BglII-EcoRI fragment of the L chain plasmid is then insertedinto a BglII-EcoRI cleavage site of the H chain plasmid to obtain anantibody expressing plasmid that coexpresses the H chain and the Lchain.10. An antibody expressingn plasmid is extracted/purified from E. coliby using a QIAGEN Plasmid Mini kit and introduced into a CHO cell byusing TransIT®-CHO Transfection Kit (TAKARA).11. The transformed antibody expressing CHO cell line is cultured forexpansion. The culture supernatant is collected and purified by using aProtein A agarose affinity column (HiTrap Protein A HP Columns, GEHealthcare) in accordance with the method of application.12. After measuring the amount of obtained antibody protein with anabsorption spectrometer, binding reactivity with an antigen peptide isexamined by ELISA.

Ishida I, Tomizuka K, Yoshida H, Tahara T, Takahashi N, Ohguma A, TanakaS, Umehashi M, Maeda H, Nozaki C, Halk E, Lonberg N. Production of humanmonoclonal and polyclonal antibodies in TransChromo animals. CloningStem Cells. 2002; 4(1): 91-102. Review. can be referred with regard toKM mice used in this Example. Ito M, Hiramatsu H, Kobayashi K, Suzue K,Kawahata M, Hioki K, Ueyama Y, Koyanagi Y, Sugamura K, Tsuji K, Heike T,Nakahata T. NOD/SCID/gamma(c) (null) mouse: an excellent recipient mousemodel for engraftment of human cells. Blood. 2002 Nov. 1; 100(9):3175-82. can be referred with regard to NOG mice. Jayapal K P, WlaschinK F, Hu W-S, Yap MGS. Recombinant protein therapeutics from CHO cells-20years and counting. Chem Eng Prog. 2007; 103:40?47.; Chusainow J, Yang YS, Yeo J H, Toh P C, Asvadi P, Wong N S, Yap M G. A study of monoclonalantibody-producing CHO cell lines: what makes a stable high producer?Biotechnol Bioeng. 2009 Mar. 1; 102(4): 1182-96 can be referred withregard to CHO cell/antibody production.

Applied Example 2: Cancer Idiotype Peptide Sensitization Immune CellTherapeutic Method

The present Example provides a demonstration of an example for a canceridiotype peptide sensitization immune cell therapeutic method using therepertoire analysis of the present invention. The procedure thereof isexplained below (see FIG. 62).

(1) 10 mL of whole blood is collected from a malignant lymphoma patient.Peripheral blood mononuclear cells (PBMC) are separated by Ficoll-Paquegradient centrifugation (GE Healthcare Bioscience, 17-1440-02)(2) Total RNA is extracted from the patient PBMCs by using a TrizolReagent (Invitrogen).(3) A cDNA is synthesized from RNA with a reverse transcriptase(Superscript II, Invitrogen, 18064-014), and then a dsDNA is synthesizedwith DNA Polymerase (Invitrogen, 18010-017), E. coli Ligase (Invitrogen,18052-019), and RNase H (Invitrogen, 18021071). Furthermore, theterminal is blunted by T4 DNA Polymerase (Invitrogen, 18005-025). Aftera ligation reaction of a P20EA/P10EA adaptor with T4 ligase (Invitrogen,15224-025) (see Preparation Example 2 and the like), the product wasdigested with NotI (TaKaRa, 1166A).(4) The 1^(st) PCR is carried out by using a P20EA adaptor (SEQ ID NO: 2and a C region specific primer of IgM of a BCR gene (CM1 (SEQ ID NO:5)), and the 2^(nd) PCR is carried out by using CM2 (SEQ ID NO: 6) and aP20EA primer (SEQ ID NO: 2). 20 cycles of PCR reaction were eachperformed, where a cycle was 30 seconds at 95° C., 30 seconds at 55° C.,and one minute at 72° C.(5) Column purification is performed with a High Pure PCR Cleanup MicroKit (Roche) to remove a primer from a 2^(nd) PCR amplicon. Subsequently,PCR is performed by using a B-P20EA primer (SEQ ID NO: 4), which is aP20EA primer (SEQ ID NO: 2) added with an adaptor B sequence (SEQ ID NO:1375), and a GS-PCR primer (see Table 1-1 for sequence information),which is an IgM C region specific primer (CM3) added with an adaptor Asequence (SEQ ID NO: 39) and identification sequence MID Tag sequence(see Table 1-6).(6) After GS-PCR amplification, 2% agarose gel electrophoresis wascarried out. A band was cut out in a size of interest (500 bp-700 bp),when visualized, and purified by using a QIAEX II Gel Extraction Kit(QIAGEN). The amount of the collected DNA was measured by using aQuant-iT™PicoGreen® dsDNA Assay Kit (Invitrogen). 10 million DNAs areused in emulsion PCR for sequence analysis by Roche's next generationsequence analyzer (GS Junior Bench Top system).(7) A TCR/BCR repertoire analysis software that was newly developed inthe present invention (Repertoire Genesis, see Analytical test examples,Analysis Examples 1-5 and the like herein) is used to assign V and Jsequences and determine an estimated amino acid sequence of a CDR3region with the obtained sequence data. At the same time, the number ofcopies for the same base sequence is counted to provide a ranking byfrequency of appearance.(8) The highest ranking BCR gene is determined. The number of reads ofBCR thereof accounting for 10% or more of the total is confirmed to benotably high to identify said BCR gene as a BCR gene derived from tumor.(9) An HLA binding peptide prediction program BIMAS (www dot bimas dotcit dot nih dot gov/) is used to predict an HLA-binding peptide for anestimated amino acid sequence of the tumor derived BCR gene. The defaultcondition is used unless a particular condition is specified. The BCRamino acid sequence and patient HLA type are input into BIMAS todetermine an estimated HLA binding peptide exhibiting the highest scoreamong peptides in a CDR3 amino acid sequence or peptides comprising apart of the CDR3 amino acid sequence.(10) A cytotoxic T cell (CTL) therapeutic method or dendritic cell (DC)vaccine therapeutic method is carried out by using an HLA-bindingpeptide with a high score as an individualized cancer peptide. Here, aDC vaccine therapeutic method is implemented.(11) The individualized cancer peptide sequence is chemicallysynthesized by using a fully automatic peptide synthesizer (ProteinTechnologies, Inc.) Peptides with a yield of 1 mg or greater and purityof 95% or greater are acquired. The peptides are dissolved in 50% DMSOand stored at −20° C.(12) A blood component collecting apparatus (Terumo apheresis apparatusAC-555) is used to separate monocytes from a cancer patient. Afterwashing cells including the monocytes in an AIM-V medium (Invitrogen,12055091), the number of cells is counted.(13) After removing cells that did not adhere to a plastic plate, thecells were cultured for about 1 week in an AIM-V medium comprising 2000U/mL of granulocyte macrophage colony stimulating factors (GM-CSF, WakoPure Chemical) and 400 U/mL of interleukin-4 (IL-4, Petrotech) and areinduced to differentiate into dendritic cells (DC).(14) Differentiation into DCs is confirmed by examining expression ofMHC class I & II molecules, CD40, CD80 or CD86 by using FACS analysis.2×10⁶ cells are then added with 20 μg/mL of individualized cancerpeptide and further cultured for a day with stimulating factors(Picibanil (OK-432), Picibanil Injection 0.5KE, Chugai Pharmaceutical)in an AIM-V medium (same as above).(15) Peptide-stimulated DC cells are collected and washed with saline,and then intravenously injected into the cancer patient by intravenousdrip.

(Results)

The following is accomplished by the present Example.

(1) Next generation BCR repertoire analysis on peripheral blood of amalignant lymphoma patient identifies one type of IgM immunoglobulinheavy chain and one type of IgM immunoglobulin light chain accountingfor 50% or more of all BCR reads.(2) A CDR3 region of these immunoglobulin genes is identified by aRepertoire Genesis program.(3) The patient HLA type (e.g., HLA-A #02) and IgM immunoglobulin heavychain CDR3 amino acid sequence are input into a BIMAS program. A peptidesequence exhibiting the highest bond score is selected.(4) This peptide is chemically synthesized as an individualized cancerpeptide with a fully automatic peptide synthesizer, and the peptidestimulates and activates DCs from a patient in vitro.(5) Individualized peptide-stimulated DC cells are intravenouslyintroduced into the patient, whereby reduction in the number of tumorcells and improvement in clinical symptom can be observed.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) A BCR sequence derived from the patient's cancer cells can be usedto make an individualized cancer peptide for therapy. A DC therapeuticmethod or CTL therapeutic method can be administered to a wide range ofpatients, regardless of HLA type or antigen expression.(2) Since a peptide adapted to the HLA of the patient is used, it ispossible to materialize an effective DC therapeutic method of CTLtherapeutic method that is more adapted to the patient and highlyspecific to cancer cells.(3) Since an antigen peptide can be chemically synthesized directly froma genetic sequence obtained from BCR analysis, it is highly safe andrequires no antigen identification or the like.

Applied Example 3: Improved CTL Method

The present Example demonstrates an example with an improved CTL methodusing repertoire analysis of the present invention. The procedurethereof is explained below (see FIG. 63).

(1) A cancer idiotype peptide is identified by the method of (1)-(9) inApplied Example 2.(2) An existing cancer peptide (NY-ESO-1 peptide) or a cancer idiotypepeptide (peptide identified in (1)) is chemically synthesized by using afully automatic peptide synthesizer (Protein Technologies, Inc.)Peptides with a yield of 1 mg or greater and purity of 95% or greaterare acquired. The peptides are dissolved in 50% DMSO and stored at −20°C.(3) 20 mL of peripheral blood is collected from the cancer patient.Peripheral blood mononuclear cells (PBMC) are separated by Ficoll-Paquegradient centrifugation (see Applied Example 2).(4) CD8⁺ T cells are separated by using a CD8⁺ T cell separationmagnetic bead (Miltenyi Biotech) or flow cytometry apparatus (FACS AriaII, Beckton Dickinson).(5) Monocytes separated by a blood component collecting apparatus(Terumo apheresis apparatus AC-555) or PBMCs are cultured in a cultureplate (100 mm dish, Corning, 353003) and non-adhering cells are removed.(6) The adhering monocytes are cultured for about 1 week in an AIM-Vmedium (same as Applied Example 2) comprising 2000 U/mL of granulocytemacrophage colony stimulating factors (GM-CSF, Wako Pure Chemical) and400 U/mL of interleukin-4 (IL-4, Petrotech) and are induced todifferentiate into dendritic cells (DC).(7) After differentiation into DCs is confirmed, 2×10⁶ cells are addedwith 20 μg/mL of peptide (“estimated HLA binding peptide exhibiting thehighest score” in Applied

Example 2) and further cultured for a day with stimulating factors(Picibanil (OK-432), Picibanil Injection 0.5KE, Chugai Pharmaceutical)in an AIM-V medium.

(8) Furthermore, the DC culture solution is stimulated and cultured with20 μg/mL of synthetic peptide (“estimated HLA binding peptide exhibitingthe highest score” in Applied Example 2) and 2×10⁶/mL of CD8⁺ T cellsseparated in the above-described (3) and AIM-V medium (see AppliedExample 2 and the like).(9) After CD8⁺ T cells proliferated by antigen stimulation are separatedfrom DCs adhering to a plastic culture plate (100 mm dish, Corning,353003) (same as the plate in (5)), the cells are expanded and culturedin the presence of 5 μg/mL of antiCD3 antibody (OKT3, Orthoclone OKT3,Janssen Pharmaceutical) and 200 U/mL of interleukin 2 (IL-2) (RocheApplied Science, 10799068001).(9) After the activated CD8⁺ T cells are collected as CTL cells andwashed with saline, they are intravenously injected into the cancerpatient by intravenous drip.

(Results)

The present Example accomplishes the following.

(1) An HLA binding peptide is identified from a CDR3 region of a BCRgene derived from a tumor cell of a malignant lymphoma patient.(2) 2×10⁶ CD8 positive cells were collected from the peripheral blood ofthe patient with a CD8⁺ T cell separation magnetic bead. The purity is98%.(3) Antigen stimulation is applied in mixed culture of a peptide, CD8+cell, and DC cell derived from monocytes of the patient. Furthermore,CD8⁺ CTL cells can proliferate up to 50 fold in expansion and culture inthe presence of anti-CD3 antibodies and IL-2.(4) The cultured CTL cells are intravenously introduced into thepatient, whereby reduction in the number of tumor cells and improvementin clinical symptom can be observed.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) A BCR sequence derived from the patient's cancer cells can be usedto make an individualized cancer peptide for therapy. A CTL therapeuticmethod can be administered to a wide range of patients, regardless ofHLA type or antigen expression.(2) Since a peptide adapted to the HLA of the patient is used, it ispossible to materialize an effective CTL therapeutic method that is moreadapted to the patient and highly specific to cancer cells.(3) Since an antigen peptide can be chemically synthesized directly froma genetic sequence obtained from BCR analysis, it is highly safe andrequires no antigen identification or the like.

Applied Example 4: DC Vaccine Therapeutic Method

The present Example demonstrates an example of a DC vaccine therapeuticmethod using repertoire analysis of the present invention. The procedurethereof is explained below (see FIG. 64).

(1) A cancer idiotype peptide is identified by the method of (1)-(9) inApplied Example 2.(2) An existing cancer peptide (NY-ESO-1 peptide) or a cancer idiotypepeptide (peptide identified in (1)) is chemically synthesized by using afully automatic peptide synthesizer (Protein Technologies, Inc.)Peptides with a yield of 1 mg or greater and purity of 95% or greaterare acquired. The peptides are dissolved in 50% DMSO and stored at −20°C. Monocytes are separated by component collection (apheresis) from acancer patient.(3) Monocytes are separated by a blood component collecting apparatus(Terumo apheresis apparatus AC-555) from the patient. Cells includingmonocytes are washed in an AIM-V medium (see Applied Example 2 and thelike), and the number of cells was counted.(4) After removing cells that did not adhere to a plastic plate (100 mmdish, Corning, 353003), the cells were cultured for about 1 week in anAIM-V medium (see Applied Example 2) comprising 2000 U/mL of granulocytemacrophage colony stimulating factors (GM-CSF, Wako Pure Chemical) and400 U/mL of interleukin-4 (IL-4, Petrotech) and are induced todifferentiate into dendritic cells (DC).(5) Differentiation into DCs is confirmed by examining expression of MHCclass I & II molecules, CD40, CD80 or CD86 by using FACS. 2×10⁶ cellsare added with 20 μg/mL of peptide (peptide synthesized in (2)) andfurther cultured for a day with stimulating factors (Picibanil (OK-432),Picibanil Injection 0.5KE, Chugai Pharmaceutical) in an AIM-V medium(see Applied Example 2 and the like).(6) Peptide-stimulated DC cells are collected and washed with saline,and then intravenously injected (Terufusion Infusion System, Terumo)into the cancer patient by intravenous drip.

(Results)

The present Example accomplishes the following.

(1) An HLA binding peptide is identified from a CDR3 region of a BCRgene derived from a tumor cell of a malignant lymphoma patient.(2) Monocytes are separated from the peripheral blood of the patient andcultured in a differentiation culture medium to detect MHC DR+, CD40+,or CD80/CD86+ cells for confirming differentiation from monocytes toDCs.(3) The peptide-stimulated DC cells are intravenously introduced intothe patient, whereby reduction in the number of tumor cells andimprovement in clinical symptom can be observed.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) A BCR sequence derived from the patient's cancer cells can be usedto make an individualized cancer peptide for therapy. A DC therapeuticmethod can be administered to a wide range of patients, regardless ofHLA type or antigen expression.(2) Since a peptide adapted to the HLA of the patient is used, it ispossible to materialize an effective DC therapeutic method that is moreadapted to the patient and highly specific to cancer cells.(3) Since an antigen peptide can be chemically synthesized directly froma genetic sequence obtained from BCR analysis, it is highly safe andrequires no antigen identification or the like.

Applied Example 5: Patient Autoimmune Cell Therapeutic Method

The present Example demonstrates an example of a patient autoimmune celltherapeutic method using repertoire analysis of the present invention.The procedure thereof is explained below (see FIG. 65).

(1) A cancer idiotype peptide is identified by the method of (1)-(9) inApplied Example 2.(2) An existing cancer peptide or a cancer idiotype peptide (peptideidentified in (1)) is chemically synthesized by using a fully automaticpeptide synthesizer (Protein Technologies, Inc.) Peptides with a yieldof 1 mg or greater and purity of 95% or greater are acquired. Thepeptides are dissolved in 50% DMSO and stored at −20° C.(3) 20 mL of peripheral blood is collected from a cancer patient.Peripheral blood mononuclear cells (PBMC) are separated by Ficoll-Paquegradient centrifugation.(4) CD8⁺ T cells are separated by using a CD8⁺ T cell separationmagnetic bead (Miltenyi Biotech) or flow cytometry apparatus (FACS AriaII, Beckton Dickinson).(5) Monocytes separated by a blood component collecting apparatus(Terumo apheresis apparatus AC-555) or PBMCs are cultured in a cultureplate (100 mm dish, Corning, 353003) and non-adhering cells are removed.(6) The adhering monocytes are cultured for about 1 week in an AIM-Vmedium (same as Applied Example 2) comprising 2000 U/mL of granulocytemacrophage colony stimulating factors (GM-CSF, Wako Pure Chemical) and400 U/mL of interleukin-4 (IL-4, Petrotech) and are induced todifferentiate into dendritic cells (DC).(7) After differentiation into DCs is confirmed, 2×10⁶ cells are addedwith 20 μg/mL of peptide (peptide synthesized in (2)) and furthercultured for a day with stimulating factors in an AIM-V medium.(8) Furthermore, the DC culture solution is stimulated and cultured with20 μg/mL of synthetic peptide (peptide synthesized in (2)) and 2×10⁶/mLof CD8+ T cells isolated in the above-described (3) and AIM-V medium(same as Applied Example 2 and the like).(9) After the activated CD8+ T cells are collected withpeptide-stimulated DC cells and washed with saline, they areintravenously injected into the cancer patient by intravenous drip.

(Results)

The present Example accomplishes the following.

(1) An HLA binding peptide is identified from a CDR3 region of a BCRgene derived from a tumor cell of a malignant lymphoma patient.(2) 2×10⁶ CD8 positive cells were collected from the peripheral blood ofthe patient with a CD8+ T cell separation magnetic bead. The purity is98% or higher.(3) Monocytes are separated from the peripheral blood of the patient andcultured in a differentiation culture medium to confirm differentiationinto DCs, MHC DR+, CD40+, or CD80/CD86+.(4) Tumor-specific CTLs and DCs can proliferate by a mixed culture of apeptide, CD8+ cell, and DC derived from the patient's monocyte.(5) Peptide-stimulated CD8+ cells and DC cells are both intravenouslyintroduced into the patient, whereby reduction in the number of tumorcells and improvement in clinical symptom can be observed.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) A BCR sequence derived from the patient's cancer cells can be usedto make an individualized cancer peptide for therapy. A patientautoimmune cell therapeutic method can be administered to a wide rangeof patients, regardless of HLA type or antigen expression.(2) Since a peptide adapted to the HLA of the patient is used, it ispossible to materialize an effective patient autoimmune cell therapeuticmethod that is more adapted to the patient and highly specific to cancercells.(3) Since an antigen peptide can be chemically synthesized directly froma genetic sequence obtained from BCR analysis, it is highly safe andrequires no antigen identification or the like.(4) A synergistic effect of DC cells and CTL cells can be expected, anda high therapeutic effect is anticipated.

Applied Example 6: Isolation of Tailor-Made Cancer Specific T CellReceptor Gene, Isolation of Cancer Specific TCR Gene by In Vitro AntigenStimulation

The present Example demonstrates an example of isolation of atailor-made cancer specific T cell receptor gene and isolation of acancer specific TCR gene by in vitro antigen stimulation usingrepertoire analysis of the present invention. The procedure thereof isexplained below (see FIG. 66).

(1) Tumor cells are extracted from a cancer patient by a conventionalmethod.(2) After finely cutting the tumor cell derived from the patient in aculture medium (RPMI 1640, 11875-093, Invitrogen, hereinafter alsoreferred to as “culture solution”) and filtering with a 0.70 μm filter(Falcon cell strainer, Corning), cells are separated into single cellsand inactivated with 10 μg/ml of mitomycin C (Mitomycin C for injection,Kyowa Hakko Kirin)) for 2 hours at 37° C. in the culture solution.(3) Peripheral blood mononuclear cells (PBMC) are separated from 10 mLof whole blood of the cancer patient by Ficoll-Paque gradientcentrifugation. The PBMCs are washed and then suspended in a culturemedium (RPMI 1640) at a concentration of 2×10⁶/mL.(4) An RNA is extracted by a Trizol RNA extraction kit (Invitrogen) withsome (1×10⁶) of the PBMCs as an untreated control sample.(5) Inactivated tumor cells and the peripheral blood cells are culturedfor one week in an RPMI 1640 medium (RPMI 1640, 11875-093, Invitrogen)comprising 10% FCS (16000-044, Invitrogen) in the presence of a lowconcentration of IL-2 to stimulate and grow tumor specific T cells withan antigen.(6) After activation of the T cells, live cells are collected from theculture medium and washed with PBS (045-29795, Wako Pure Chemical), andan RNA is extracted from the cells.(7) The repertoire analysis method of the present invention isimplemented by using the RNA samples extracted in (4) and (6) (for thecondition thereof, the condition described in Analytical test examplesand Analysis Examples 1-5 herein can be used).(8) From the TCR genetic sequence data obtained by the next generationrepertoire analysis of the present invention, a TCR gene that hasgreatly increased with a stimulation sample relative to a control sampleis extracted and ranked, and then high ranking TCRα and TCRβ genes areselected;(9) Each of the full-length TCRα and TCRβ genes are cloned andintroduced into a retroviral vector for gene expression (Retro-X Vectorsand Systems, Clonetech).(10) A gene introducing retrovirus is created from transformation of apackaging cell GP2-293 cell line(631458, Clonetech) by using the TCRα and TCRβ recombinant plasmidvectors prepared in (9).(11) Lymphocyte cells separated by a blood component collectingapparatus (Terumo apheresis apparatus AC-555) are used to independentlyand successively infect gene recombinant TCRα and TCRβ retroviruses toobtain a population of lymphocytes expressing a functional αβ, TCR.(12) Expression of TCRα/TCRβ heterodimers on a cell surface and thepercentage of positive cells thereof are confirmed by FACS (see AppliedExample 5, the same condition can be used).(13) A tumor specific patient lymphocyte expressing TCRα/TCRβ ofinterest is introduced into the cells of the patient.

(Results)

The present Example accomplishes the following.

(1) When a TCR gene that increases in tumor tissue is selected andranked from comparing a sample stimulated in the tumor tissue of apatient with a control sample, TCRs that are present in a large numberin peripheral blood cells are excluded. Thus, numerous tumor specificTCR genes are extracted.(2) TCRα and TCRβ genes at about the same level of ranking are selectedfrom the extracted genes and utilized in making lymphocytes introducedwith a tumor specific TCR gene.(3) Full length TCRα and β chain genes can be cloned in a retrovirusexpression vector. A TCRα retrovirus and TCR retrovirus with a hightiter can be made by packaging.(4) The patient lymphocyte is infected with a mixed retrovirus to verifyexpression of recombinant TCRα/TCRβ by FACS(5) Tumor specific TCR gene recombinant lymphocytes manufactured by theseries of steps are introduced into the patient, whereby reduction inthe number of tumor cells and improvement in clinical symptom can beobserved.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) Patient's own cancer cell and T cell can be used to make atherapeutic lymphocyte introduced with a tumor specific TCR gene. TCRgene therapy can be administered to a wide range of patients, regardlessof HLA type or antigen expression.(2) Since a TCR sequence present in a patient sample is utilized, it ispossible to utilize a TCR gene matching the HLA of the patient andmaterialize an effective therapy that is highly specific to cancercells.(3) Since a TCR sequence present in a patient sample is utilized, it ispossible to materialize highly safe TCR gene therapy that does not reactwith normal cells of the patient.(3) Since a genetic sequence obtained from TCR analysis is directlyutilized, there is no need for identifying an antigen or obtaining a TCRgene using a specific antigen.

Applied Example 7: Preparation of Isolated Cancer Specific TCR Gene byIn Vitro Antigen Stimulation

The present Example demonstrates an example of preparation of anisolated cancer specific TCR gene by in vitro antigen stimulation usingrepertoire analysis of the present invention. The procedure thereof isexplained below (see FIG. 67).

(1) Tumor cells are extracted and peripheral blood is simultaneouslyseparated from each of the cancer patients with the same HLA.(2) An RNA is extracted by using a Trizol Reagent (Invitrogen) from alymphocyte cell or tumor tissue comprising tumor cell infiltrated Tcell.(3) A TCR gene (same as the Preparation Examples and the like) isamplified by Adaptor-ligation PCR explained in the Preparation Examplesand the like herein from the RNA to perform repertoire analysis by nextgeneration sequencing with a GS Junior Bench Top system (Roche) or thelike.(4) A newly developed TCR/BCR repertoire analysis software (RepertoireGenesis, see Analysis Examples 1-5 herein) is used on the TCR geneticsequences obtained by the use thereof to determine sequences of V, D,and CDR3 regions and to create a ranking based on the frequency ofpresence of the same sequence.(5) A TCR gene exhibiting a high frequency of presence in a tumor cellrelative to a peripheral cell in each patient (a specific example hereinis those with frequency of presence >10-fold and high ranking in tumortissue) is searched to identify the gene as tumor specific.(6) A TCR genetic sequence shared among a plurality of cancer patientshaving the same HLA is searched for such tumor specific TCR genes.(7) A tumor specific TCR gene shared among the most cancer patients isselected as a tumor specific TCR for therapy.(8) The full length TCRα and TCRβ genes are cloned and introduced into aretroviral vector for gene expression (the same one as in AppliedExample 6 can be used).(9) A gene introducing virus is created from the TCRα and TCRβ geneexpression retroviral vector in accordance with the method of (10) inthe above-described Applied Example 6.(10) Lymphocytes collected from the patient are mixed with a culturesolution containing the TCRα retrovirus made by the above-described (9)and culture solution containing a TCRβ retrovirus in equal amounts andcultured for 4 hours at 37° C. The cells are then washed with PBS andfurther cultured for 24 hours at 37° C.(11) Expression of a genetically recombinant TCRαβ molecule on a cellsurface is verified. The percentage of TCRβ chain positive cells toundergo transgenesis in CD8 positive cells is verified by FACS analysisusing antihuman CD8 antibodies (CD8c, 6602385, Beckman Courter) and anIOTest Beta Mark TCR Vβ, repertoire analysis kit (Multi-analysis TCR Vβ,antibodies, IM3497, Beckman Courter).(12) A cell with confirmed expression of TCRαβ of interest in (11) iscultured in an RPMI 1640 medium under the condition of 37° C. at aconcentration of 0.5×10⁶ cells. After washing lymphocytes introducedwith a tumor specific TCR gene with PBS, cells are introduced into thecancer patient by intravenously injection by intravenous drip(Terufusion Infusion System, Terumo).

(Results)

The present Example accomplishes the following.

(1) When a TCR gene shared among tumor tissues of patients is selectedand ranked, TCRs that are present in a large number in peripheral bloodcells are excluded. Thus, numerous tumor specific TCR genes areextracted.(2) A pair of TCRα and TCRβ genes at about the same level of ranking andpresent in the same patient is selected from the extracted genes andutilized in making lymphocytes introduced with a tumor specific TCRgene.(3) Full length TCRα and β chain genes can be cloned in a retrovirusexpression vector. A TCRα retrovirus and TCR retrovirus with a hightiter can be made by packaging.(4) The patient lymphocyte is infected with a mixed retrovirus to verifyexpression of recombinant TCRα/TCRβ by FACS.(5) Tumor specific TCR gene recombinant lymphocytes manufactured by theseries of steps are introduced into the patient, whereby reduction inthe number of tumor cells and improvement in clinical symptom can beobserved.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) Patient's own cancer cell and T cell can be used to make atherapeutic lymphocyte introduced with a tumor specific TCR gene. TCRgene therapy can be administered to a wide range of patients, regardlessof HLA type or antigen expression.(2) Since a TCR sequence present in a patient sample is utilized, it ispossible to utilize a TCR gene matching the HLA of the patient andmaterialize an effective therapy that is highly specific to cancercells.(3) Since a TCR sequence present in a patient sample is utilized, it ispossible to materialize highly safe TCR gene therapy that does not reactwith normal cells of the patient.(3) Since a genetic sequence obtained from TCR analysis is directlyutilized, there is no need for identifying an antigen or obtaining a TCRgene using a specific antigen.

Applied Example 8: Cell Processing Therapeutic Method

The present Example demonstrates an example of a cell processingtherapeutic method using repertoire analysis of the present invention.The procedure thereof is explained below (see FIG. 68).

(1) A retrovirus for transgenesis is made in accordance with AppliedExample 6 to create a lymphocyte population expressing functional αβTCRs.(2) A tumor cell derived from a patient that has been separated andinactivated in accordance with the procedures of Applied Example 6(1)-(2) is diluted with an RPMI 1640 medium (11875-093, Invitrogen).(3) The lymphocyte introduced with a tumor specific TCR genes made in(1) and inactivated tumor cells of the patient were mixed at a cellconcentration of 1×10⁶/mL at a lymphocyte-tumor cell ratio (E:T ratio)of 2:1, 1:1, and 0.5:1 and cultured for 24 hours at 37° C. by using anELISPOT kit (IFN-γ, Human, ELISpot Kit, EL285, R&D Systems).(4) After 24 hours, the cells are removed. Production of INFγ on a PVFDmembrane is detected by a coloring method, and the number of IFNγproducing cells is counted to assess the tumor specificity oflymphocytes introduced with tumor specific TCR genes.(5) When IFNγ production is not observed in 5% of cells or less, a pairis selected with a high ranking and exhibiting a ratio of presence ofTCRα and TCRβ at about the same level among TCR genes other than TCRsemployed in (8) of Applied Example 6. After going through steps (9)-(11)in Applied Example 6, a new lymphocyte introduced with a tumor specificTCR gene is made.(6) The above-described steps (1)-(4) are carried out for the TCRα andTCRβ to assess tumor specificity of lymphocytes introduced with a tumorspecific TCR gene.

(Results)

The present Example accomplishes the following.

(1) A lymphocyte introduced with a tumor specific TCR gene is made toexamine the reactivity to inactivated tumor cells. It can be understoodthat the lymphocyte introduced with a TCR gene produces IFNγ in responseto tumor.(2) The lymphocyte introduced with a tumor specific TCR gene isintroduced into a patient, and an antitumor effect and improvement inclinical symptom are observed.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) Patient's own cancer cell and T cell can be used to make atherapeutic lymphocyte introduced with a tumor specific TCR gene. TCRgene therapy can be administered to a wide range of patients, regardlessof HLA type or antigen expression.(2) Since a TCR sequence present in a patient sample is utilized, it ispossible to utilize a TCR gene matching the HLA of the patient andmaterialize an effective therapy that is highly specific to cancercells.(3) Since a TCR sequence present in a patient sample is utilized, it ispossible to materialize highly safe TCR gene therapy that does not reactwith normal cells of the patient.(3) Since a genetic sequence obtained from TCR analysis is directlyutilized, there is no need for identifying an antigen or obtaining a TCRgene using a specific antigen.

Applied Example 9: Method of Assessing Efficacy and/or Safety by InVitro Stimulation Test

The present Example provides an example demonstrating assessment ofefficacy and/or safety by in vitro stimulation test using repertoireanalysis of the present invention. The procedure thereof is explainedbelow (see FIG. 69).

(1) A retrovirus for transgenesis is made in accordance with AppliedExample 6 to create a lymphocyte population expressing a tumor specificαβ TCR.

<Efficacy Assessment>

(1) When assessing efficacy, cancer cells derived from a patient isextracted/separated and thinly cut in a culture solution (RPMI 1640,11875-093, Invitrogen) and then filtered with 0.70 μm filter (Falconcell strainer, Corning) to separate a single cell. The cells aresubjected to inactivation treatment for 2 hours at 37° C. with 10 μg/mlmitomycin C (Mitomycin C for injection, Kyowa Hakko Kirin) in theculture solution. After the inactivation treatment, the cells are mixedand cultured with T lymphocytes introduced with a tumor specific TCRgene made as described in Applied Example 6.(2) Reactivity to a tumor cell is assessed by ELISPOT shown in AppliedExample 8. That is, the lymphocytes introduced with a tumor specific TCRgene made in accordance with Applied Example 6 are mixed at a cellconcentration of 1×10⁶/mL at a lymphocyte-tumor cell ratio (E:T ratio)of 2:1, 1:1, and 0.5:1 are cultured for 24 hours at 37° C. by using anELISPOT kit (IFN-γ, Human, ELISpot Kit, EL285, R&D Systems).(3) After 24 hours, cells are removed. Production of INFy on a PVFDmembrane is detected by a coloring method, and the number of IFNγproducing cells is counted to assess the tumor specificity oflymophocytes introduced with a tumor specific TCR gene. Besides ELISPOT,assessment can be performed by a cell proliferation test such as MTTassay (Cell Proliferation Kit I, MTT assay, 11465007001, RocheDiagnostics) or IL-2 production test (Human IL-2 ELISA system, GEHealthcare, RPN5965).

(Results)

The present Example accomplishes the following.

(1) When reactivity to an inactivated tumor cell of a tumor specific TCRgene recombinant lymphocyte is examined, production of IFNγ at a highfrequency is recognized.(2) The number of IFNγ positive cells increases over time duringculturing, and reaches a plateau after 24 hours.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) Prior to administering gene therapy using a lymphocyte introducedwith a tumor specific TCR gene, efficacy of using patient's own cell canbe assessed. The efficacy can be predicted prior to therapy.(2) A TCR gene can be selected and utilized by assessing efficacy. Thus,a more effective TCR gene therapy is possible.

<Safety assessment>

(1′) When assessing safety, the same test as (1) and (2) is carried outby using a control that is an existing cell line, normal tissueconsidered to be free of cancer cells of a patient (part of normaltissue collected in the process of tumor extraction) or peripheral bloodcell of the patient in case of solid tumor.(2′) Reactivity of a T lymphocyte introduced with a tumor specific TCRgene to normal tissue is quantified and assessed by ELISPOT.(3′) A T lymphocyte introduced with a tumor specific TCR gene with lowreactivity to normal cells and high reactivity to tumor cells isselected for use in therapy of a patient.

(Results)

The present Example accomplishes the following.

(1) When lymphocytes introduced with a tumor specific TCR gene are madeto examine reactivity to inactivated normal cells, it can be understoodthat IFNγ is not produced and reactivity to normal cells is hardlyexhibited.

(Discussion)

It is understood from the present Example that the present inventionaccomplishes the following effect.

(1) Prior to administering a high-risk gene therapy using a lymphocyteintroduced with a tumor specific TCR gene, safety of using the patient'sown cell can be assessed to materialize safer therapy.(2) High risk TCR genes can be excluded to administer therapy using asafer TCR gene by assessing safety.

As described above, the present invention is exemplified by the use ofits preferred Embodiments. However, it is understood that the scope ofthe present invention should be interpreted solely based on the claims.It is also understood that any patent, any patent application, and anyreferences cited herein should be incorporated by reference in thepresent specification in the same manner as the contents arespecifically described herein. The present application claims priorityto Japanese Patent Application Nos. 2013-241403, 2013-241404, and2013-241405, the entire content of which is incorporated herein byreference.

INDUSTRIAL APPLICABILITY

The present invention is especially useful in clinical applicationswhere quantitative analysis is especially required and a sample isprovided for highly precise, unbiased, large scale gene analysis.

SEQUENCE LISTING FREE TEXT

SEQ ID NOs: 1-19: primer sequence used in Example 1 (Table 1)SEQ ID NOs: 20-31: CDR3 amino acid sequence of BCR readSEQ ID NOs: 32-38: primer sequence used in Example 2 (Table 2)SEQ ID NO: 39: sequence of Adaptor-ASEQ ID NOs: 40-60: sequencing primer (Table 6)SEQ ID NOs: 61-1164: CDR3 amino acid sequence of BCR read (Table 1H)SEQ ID NOs: 1165-1324: TCR read in serially diluted Molt-4 cell samplesSEQ ID NOs: 1325-1374: example of molecule identification (MID Tag)sequenceSEQ ID NO: 1375: sequence of Adaptor-BSEQ ID NOs: 1376-1379: each full length sequence of TCRSEQ ID NOs: 1381-1386: each full length sequence of BCRSEQ ID NO: 1387: specific sequence (CM3) in CM3-GS (SEQ ID NO: 7)SEQ ID NO: 1388: specific sequence (CA3) in CA3-GS (SEQ ID NO: 10)SEQ ID NO: 1389: specific sequence (CG3) in CG3-GS (SEQ ID NO: 13)SEQ ID NO: 1390: specific sequence (CD3) in CD3-GS (SEQ ID NO: 16)SEQ ID NO: 1391: specific sequence (CE3) in CE3-GS (SEQ ID NO: 19)SEQ ID NO: 1392: target sequence TRBC, name TRBC2*01, membrane boundformSEQ ID NO: 1393: target sequence TRBC, name TRBC2*02, membrane boundformSEQ ID NO: 1394: target sequence TRGC, name TRGC1*02, membrane boundformSEQ ID NO: 1395: target sequence TRGC, name TRGC2*01, membrane boundformSEQ ID NO: 1396: target sequence TRGC, name TRGC2*02, membrane boundformSEQ ID NO: 1397: target sequence TRGC, name TRGC2*03, membrane boundformSEQ ID NO: 1398: target sequence TRGC, name TRGC2*04, membrane boundformSEQ ID NO: 1399: target sequence TRGC, name TRGC2*05, membrane boundformSEQ ID NO: 1400: target sequence IGHA, name IGHA2*01, secreted formSEQ ID NO: 1401: target sequence IGHA, name IGHA2*02s, secreted formSEQ ID NO: 1402: target sequence IGHA, name IGHA2*02, membrane boundformSEQ ID NO: 1403: target sequence IGHA, name IGHA2*03, secreted formSEQ ID NO: 1404: target sequence IGHD, name IGHD*01, secreted formSEQ ID NO: 1405: target sequence IGHD, name IGHD*02, secreted formSEQ ID NO: 1406: target sequence IGHD, name IGHD*02, membrane bound formSEQ ID NO: 1407: target sequence IGHE, name IGHE*01, membrane bound formSEQ ID NO: 1408: target sequence IGHE, name IGHE*02, secreted formSEQ ID NO: 1409: target sequence IGHE, name IGHE*03, membrane bound formSEQ ID NO: 1410: target sequence IGHE, name IGHE*04, secreted formSEQ ID NO: 1411: target sequence IGHE, name IGHE*04, membrane bound formSEQ ID NO: 1412: target sequence IGHG, name IGHG1*02, secreted formSEQ ID NO: 1413: target sequence IGHG, name IGHG1*03, secreted formSEQ ID NO: 1414: target sequence IGHG, name IGHG2*0, secreted formSEQ ID NO: 1415: target sequence IGHG, name IGHG2*01, membrane boundformSEQ ID NO: 1416: target sequence IGHG, name IGHG2*02, secreted formSEQ ID NO: 1417: target sequence IGHG, name IGHG2*03, secreted formSEQ ID NO: 1418: target sequence IGHG, name IGHG2*04, secreted formSEQ ID NO: 1419: target sequence IGHG, name IGHG2*05, secreted formSEQ ID NO: 1420: target sequence IGHG, name IGHG2*06, secreted formSEQ ID NO: 1421: target sequence IGHG, name IGHG2*06, membrane boundformSEQ ID NO: 1422: target sequence IGHG, name IGHG3*01, secreted formSEQ ID NO: 1423: target sequence IGHG, name IGHG3*01, membrane boundformSEQ ID NO: 1424: target sequence IGHG, name IGHG3*03, secreted formSEQ ID NO: 1425: target sequence IGHG, name IGHG3*03, membrane boundformSEQ ID NO: 1426: target sequence IGHG, name IGHG3*04, secreted formSEQ ID NO: 1427: target sequence IGHG, name IGHG3*05, secreted formSEQ ID NO: 1428: target sequence IGHG, name IGHG3*06, secreted formSEQ ID NO: 1429: target sequence IGHG, name IGHG3*07, secreted formSEQ ID NO: 1430: target sequence IGHG, name IGHG3*08, secreted formSEQ ID NO: 1431: target sequence IGHG, name IGHG3*09, secreted formSEQ ID NO: 1432: target sequence IGHG, name IGHG3*10, secreted formSEQ ID NO: 1433: target sequence IGHG, name IGHG3*11, secreted formSEQ ID NO: 1434: target sequence IGHG, name IGHG3*12, secreted formSEQ ID NO: 1435: target sequence IGHG, name IGHG3*13, secreted formSEQ ID NO: 1436: target sequence IGHG, name IGHG3*14, secreted formSEQ ID NO: 1437: target sequence IGHG, name IGHG3*15, secreted formSEQ ID NO: 1438: target sequence IGHG, name IGHG3*16, secreted formSEQ ID NO: 1439: target sequence IGHG, name IGHG3*17, secreted formSEQ ID NO: 1440: target sequence IGHG, name IGHG3*18, secreted formSEQ ID NO: 1441: target sequence IGHG, name IGHG3*19, secreted formSEQ ID NO: 1442: target sequence IGHG, name IGHG4*01, secreted formSEQ ID NO: 1443: target sequence IGHG, name IGHG4*02, secreted formSEQ ID NO: 1444: target sequence IGHG, name IGHG4*03, secreted formSEQ ID NO: 1445: target sequence IGHG, name IGHG4*04, secreted formSEQ ID NO: 1446: target sequence IGHG, name IGHG4*04, membrane boundformSEQ ID NO: 1447: target sequence IGHM, name IGHM*01, membrane bound formSEQ ID NO: 1448: target sequence IGHM, name IGHM*03, secreted formSEQ ID NO: 1449: target sequence IGHM, name IGHM*03, membrane bound formSEQ ID NOs: 1450-1499: TRA reads (top 50) (Table 3-1)SEQ ID NOs: 1500-1549: TRB reads (top 50) (Table 3-2)SEQ ID NOs: 1550-1587: TCRα chain read sequence overlapping in healthyindividuals (Table 3-7)SEQ ID NOs: 1588-1626: TCRβ chain read sequence overlapping in healthyindividuals (Table 3-8)SEQ ID NOs: 1627-1647: invariant TCR candidate gene (Table 3-9)SEQ ID NOs: 1648-1860: overlapping TCRα read sequence and cancerspecific TCRα read in cancer patients (Table 3-11)SEQ ID NOs: 1861-1909: overlapping TCRβ read sequence and cancerspecific TCRβ in cancer patients (Table 3-12)SEQ ID NOs: 1910-1921: P5-P20EA primerSEQ ID NOs: 1922-1929: P7-CA3 primerSEQ ID NOs: 1930-1937: P7-CB3 primerSEQ ID NOs: 1938-1992: invariant TCR sequence observed in public TCRαsequence identified in Example 5 of analysis system

1. A method of quantitatively analyzing a repertoire of a variableregion of a T cell receptor (TCR) or a B cell receptor (BCR) of asubject by using a database, wherein the method comprises the steps of:(1) providing a nucleic acid sample comprising a nucleic acid sequenceof the T cell receptor (TCR) or the B cell receptor (BCR) which isamplified from the subject in an unbiased manner; (2) determining thenucleic acid sequence comprised in the nucleic acid sample; and (3)calculating a frequency of appearance of each gene or a combinationthereof based on the determined nucleic acid sequence to derive a TCR orBCR repertoire of the subject, wherein the nucleic acid sample comprisesnucleic acid sequences of a plurality of types of T cell receptors (TCR)or B cell receptors (BCR) and the step (2) determines the nucleic acidsequence by a single sequencing performed with a common adaptor primer.2. The method of claim 1, wherein the step (1) comprises the followingsteps: (1-1) synthesizing a complementary DNA by using an RNA samplederived from a target cell as a template; (1-2) synthesizing a doublestranded complementary DNA by using the complementary DNA as a template;(1-3) synthesizing an adaptor-added double stranded complementary DNA byadding a common adaptor primer sequence to the double strandedcomplementary DNA; (1-4) performing a first PCR amplification reactionby using the adaptor-added double stranded complementary DNA, a commonadaptor primer consisting of the common adaptor primer sequence, and afirst TCR or BCR C region specific primer, wherein the first TCR or BCRC region specific primer is designed to comprise a sequence that issufficiently specific to a C region of interest of the TCR or BCR andnot homologous with other genetic sequences, and comprise a mismatchingbase between subtypes downstream when amplified; (1-5) performing asecond PCR amplification reaction by using a PCR amplicon of (1-4), thecommon adaptor primer, and a second TCR or BCR C region specific primer,wherein the second TCR or BCR C region specific primer is designed tohave a sequence that is a complete match with the TCR or BCR C region ina sequence downstream the sequence of the first TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified; and (1-6) performing a third PCRamplification reaction by using a PCR amplicon of (1-5), an added commonadaptor primer in which a nucleic acid sequence of the common adaptorprimer comprises a first additional adaptor nucleic acid sequence, andan adaptor-added third TCR or BCR C region specific primer in which asecond additional adaptor nucleic acid sequence is added to a third TCRor BCR C region specific sequence, wherein the third TCR or BCR C regionspecific primer is designed to have a sequence that is a complete matchwith the TCR or BCR C region in a sequence downstream to the sequence ofthe second TCR or BCR C region specific primer, but comprise a sequencethat is not homologous with other genetic sequences, and comprise amismatching base between subtypes downstream when amplified.
 3. Themethod of claim 1, wherein the single sequencing is characterized inthat at least one of the sequences used as a primer in amplificationfrom the nucleic acid sample into a sample for sequencing has the samesequence as a nucleic acid sequence encoding a C region or acomplementary strand thereof.
 4. The method of claim 1, wherein theunbiased amplification is not V region specific amplification.
 5. Themethod of claim 1, wherein the repertoire is the repertoire of avariable region of a BCR, and the nucleic acid sequence is a BCR nucleicacid sequence.
 6. The method of claim 1, wherein (3) derivation of theTCR or BCR repertoire is accomplished by a method comprising thefollowing steps: (3-1) providing a reference database for each generegion comprising at least one of a V region, a D region, a J region andoptionally a C region; (3-2) providing an input sequence set which isoptionally trimmed and optionally extracted to have a suitable length;(3-3) searching for homology of the input sequence set with thereference database for the each gene region and recording an alignmentwith an approximate reference allele and/or a sequence of the referenceallele; (3-4) assigning the V region and the J region for the inputsequence set and extracting a nucleic acid sequence of the D regionbased on a result of assigning; (3-5) translating the nucleic acidsequence of the D region into an amino acid sequence and classifying theD region by utilizing the amino acid sequence; and (3-6) calculating afrequency of appearance for each of the V region, the D region, and theJ region and optionally the C region or a frequency of appearance of acombination thereof based on the classifying in (3-5) to derive the TCRor BCR repertoire.
 7. A system for quantitatively analyzing a repertoireof a variable region of a T cell receptor (TCR) or a B cell receptor(BCR) of a subject by using a database, wherein the system comprises:(1) a kit for providing a nucleic acid sample comprising a nucleic acidsequence of the T cell receptor (TCR) or the B cell receptor (BCR) whichis amplified from the subject in an unbiased manner; (2) an apparatusfor determining the nucleic acid sequence comprised in the nucleic acidsample; and (3) an apparatus for calculating a frequency of appearanceof each gene or a combination thereof based on the determined nucleicacid sequence to derive a TCR or BCR repertoire of the subject, whereinthe nucleic acid sample comprises nucleic acid sequences of a pluralityof types of T cell receptors (TCR) or B cell receptors (BCR) and thestep (2) determines the nucleic acid sequence by a single sequencingperformed with a common adaptor primer.
 8. The system of claim 7,wherein the repertoire is the repertoire of a variable region of a BCR,and the nucleic acid sequence is a BCR nucleic acid sequence.
 9. Thesystem of claim 7, wherein the kit comprises the following: (1-1) meansfor synthesizing a complementary DNA by using an RNA sample derived froma target cell as a template; (1-2) means for synthesizing a doublestranded complementary DNA by using the complementary DNA as a template;(1-3) means for synthesizing an adaptor-added double strandedcomplementary DNA by adding a common adaptor primer sequence to thedouble stranded complementary DNA; (1-4) means for performing a firstPCR amplification reaction by using the adaptor-added double strandedcomplementary DNA, a common adaptor primer consisting of the commonadaptor primer sequence, and a first TCR or BCR C region specificprimer, wherein the first TCR or BCR C region specific primer isdesigned to comprise a sequence that is sufficiently specific to a Cregion of interest of the TCR or BCR and not homologous with othergenetic sequences, and comprise a mismatching base between subtypesdownstream when amplified; (1-5) means for performing a second PCRamplification reaction by using a PCR amplicon of (1-4), the commonadaptor primer, and a second TCR or BCR C region specific primer,wherein the second TCR or BCR C region specific primer is designed tohave a sequence that is a complete match with the TCR or BCR C region ina sequence downstream the sequence of the first TCR or BCR C regionspecific primer, but comprise a sequence that is not homologous withother genetic sequences, and comprise a mismatching base betweensubtypes downstream when amplified; and (1-6) means for performing athird PCR amplification reaction by using a PCR amplicon of (1-5), anadded common adaptor primer in which a nucleic acid sequence of thecommon adaptor primer comprises a first additional adaptor nucleic acidsequence, and an adaptor-added third TCR or BCR C region specific primerin which a second additional adaptor nucleic acid sequence is added to athird TCR or BCR C region specific sequence, wherein the third TCR orBCR C region specific primer is designed to have a sequence that is acomplete match with the TCR or BCR C region in a sequence downstream tothe sequence of the second TCR or BCR C region specific primer, butcomprise a sequence that is not homologous with other genetic sequences,and comprise a mismatching base between subtypes downstream whenamplified.
 10. The system of claim 7, wherein (3) an apparatus forderiving the TCR or BCR repertoire comprises the following: (3-1) meansfor providing a reference database for each gene region comprising atleast one of a V region, a D region, a J region and optionally a Cregion; (3-2) means for providing an input sequence set which isoptionally trimmed and optionally extracted to have a suitable length;(3-3) means for searching for homology of the input sequence set withthe reference database for the each gene region and recording analignment with an approximate reference allele and/or a sequence of thereference allele; (3-4) means for assigning the V region and the Jregion for the input sequence set and extracting a nucleic acid sequenceof the D region based on a result of assigning; (3-5) means fortranslating the nucleic acid sequence of the D region into an amino acidsequence and classifying the D region by utilizing the amino acidsequence; and (3-6) means for calculating a frequency of appearance foreach of the V region, the D region, and the J region and optionally theC region or a frequency of appearance of a combination thereof based onthe classifying in (3-5) to derive the TCR or BCR repertoire.
 11. Amethod of preparing a composition for use in a cancer idiotype peptidesensitization immune cell therapeutic method to a subject, the methodcomprising: (1) analyzing a T cell receptor (TCR) or B cell receptor(BCR) repertoire of the subject by the method of claim 1; (2)determining a TCR or BCR derived from a cancer cell of the subject,wherein the cancer is leukemia such as Acute T-lymphoblastic leukemia,Chronic lymphocytic leukemia, Chronic myelogenous leukemia, adult T-cellleukemia, T-cell large granular lymphocyte leukemia, and malignantlymphoma, based on a result of the analysis, wherein the determining isdone by selecting a high ranking sequence in a frequency of presenceranking of a TCR or BCR gene derived from the cancer cell of the subjectas the TCR or BCR derived from the cancer cell; (3) determining an aminoacid sequence of a candidate HLA test peptide based on the determinedTCR or BCR derived from cancer, wherein the determining is performedbased on a score calculated by using an HLA binding peptide predictionalgorithm; and (4) synthesizing the determined peptide.
 12. A method ofpreparing an isolated cancer specific TCR gene by an in vitro antigenstimulation, comprising: (A) mixing an antigen peptide or antigenprotein derived from a subject or the determined peptide of claim 11 ora lymphocyte derived from the subject, an inactivated cancer cellderived from the subject, and a T lymphocyte derived from the subjectand culturing the mixture to produce a tumor specific T cell; (B)analyzing a TCR of the tumor specific T cell by the method of claim 1;and (C) isolating a desired tumor specific T cell based on a result ofthe analyzing.
 13. An in vitro method of preparing a T lymphocyteintroduced with a tumor specific TCR gene for use in a cell processingtherapeutic method, comprising: A) providing a T lymphocyte collectedfrom a patient; B) analyzing TCRs based on the method of claim 1 afterapplying an antigen stimulation to the T lymphocyte, wherein the antigenstimulation is applied by an antigen peptide or antigen protein derivedfrom the subject, an inactivated cancer cell derived from the subject,or an idiotype peptide derived from tumor; C) selecting an optimal TCRand an optimal antigen in the analyzed TCRs; and D) producing a tumorspecific a and 13 TCR expression viral vector of a TCR gene of theoptimal TCR.
 14. The method of claim 2, wherein the first, second andthird TCR or BCR C region specific primers are each independently aprimer for TCR or BCR repertoire analysis, the primer being selected tobe a sequence that is a complete match with each isotype C region ofIgM, IgG, IgA, IgD or IgE, and for a BCR, a complete match with subtypesfor IgG and IgA, and not homologous with other sequences comprised inthe database, and comprise a mismatching base between subtypesdownstream of the primer, and wherein the common adaptor primer sequenceis designed such that the sequence has a base length suitable foramplification, is unlikely to have homodimer and intramolecular hairpinstructures, and is able to stably form a double strand, and designed notto be highly homologous with all BCR genetic sequences in the databaseand to have the same level of Tm as the C region specific primer. 15.The method of claim 2, wherein the first TCR or BCR C region specificprimer has the following structure: CM1 (SEQ ID NO: 5), CA1 (SEQ ID NO:8), CG1 (SEQ ID NO: 11), CD1 (SEQ ID NO: 14), CE1 (SEQ ID NO: 17), CA1(SEQ ID NO: 35) or CB1 (SEQ ID NO: 37).
 16. The method of claim 2,wherein the second TCR or BCR C region specific primer has the followingstructure: CM2 (SEQ ID NO: 6), CA2 (SEQ ID NO: 9), CG2 (SEQ ID NO: 12),CD2 (SEQ ID NO: 15), CE2 (SEQ ID NO: 18), CA2 (SEQ ID NO: 35), or CB2(SEQ ID NO: 37).
 17. The method of claim 2, wherein each of the TCR orBCR C region specific primers is provided in a set compatible with allTCR or BCR subclasses.
 18. The method of claim 13, wherein the antigenstimulation is applied with the antigen peptide or antigen proteinderived from the subject.
 19. The method of claim 13, wherein theantigen stimulation is applied with the inactivated cancer cell derivedfrom the subject.
 20. The method of claim 13, wherein the step C)comprises selecting an antigen that is highly expressed in cancer tissueof the subject.
 21. The method of claim 13, wherein the step C)comprises selecting an antigen which most strongly activates a T cell inan antigen specific lymphocyte stimulation test.